google / gson

A Java serialization/deserialization library to convert Java Objects into JSON and back
Apache License 2.0
23.39k stars 4.29k forks source link

URI type adapter does not preserve object identity #1587

Open sagebind opened 5 years ago

sagebind commented 5 years ago

The type adapter included by default for java.net.URI does not preserve the identity of all URI values. In other words:

URI uri = new URI("s3://bucket/path/Kankyō.png");
URI roundTripUri = gson.fromJson(gson.toJson(uri), URI.class);
assert roundTripUri.equals(uri); // AssertionError

The problem is that the implementation here uses URI#toASCIIString, which does not preserve the original URI components but instead normalizes them like this:

Normalizer.normalize(s, Normalizer.Form.NFC)

This is very bad, because it is changing the meaning of the URI on serialization. This issue has caused numerous bugs at our business dealing with international filenames being uploaded to our SAAS products and then being changed after being passed between multiple microservices.

Here is an alternate implementation that we have started to use in our code:

public class NonNormalizingUriTypeAdapter extends TypeAdapter<URI> {
    @Override
    public URI read(JsonReader in) throws IOException {
        if (in.peek() == JsonToken.NULL) {
            in.nextNull();
            return null;
        }
        try {
            String nextString = in.nextString();
            return "null".equals(nextString) ? null : new URI(nextString);
        }
        catch (URISyntaxException e) {
            throw new JsonIOException(e);
        }
    }

    @Override
    public void write(JsonWriter out, URI value) throws IOException {
        out.value(value == null ? null : value.toString());
    }
}

Really the only difference is using toString() instead of toASCIIString(), which URL-encodes the result without changing the character byte representations.

inder123 commented 5 years ago

Interesting point. If we do change the default behavior, will that break anyone?

sagebind commented 5 years ago

It is certainly possible that someone is relying on the current behavior, though this is bad enough that I would call it a significant bug that should be fixed.