joelittlejohn / jsonschema2pojo

Generate Java types from JSON or JSON Schema and annotate those types for data-binding with Jackson, Gson, etc
http://www.jsonschema2pojo.org
Apache License 2.0
6.24k stars 1.66k forks source link

Unicode escaped characters in Jackson annotations #1171

Open antonovdmitriy opened 4 years ago

antonovdmitriy commented 4 years ago

Description

Inside annotations, non-ascii characters are escaped, even though outputEncoding is set to utf-8. Note that comments are displayed correctly in unescaped form. Like this: image

Reason

This is because the codemodel library is being used, which forces non-ascii characters inside annotations to be escaped in method com.sun.codemodel.JExpr#quotify. image image

Repeating

repeating the situation, you can change in the description.json file jsonschema2pojo\jsonschema2pojo-integration-tests\src\test\resources\schema\description\description.json image and launch unit -test org.jsonschema2pojo.integration.config.AnnotationStyleIT#annotationStyleJackson2ProducesJsonPropertyDescription and then find generated file image

unkish commented 4 years ago

If we'd leave aside for a while the fact that characters are escaped inside @JsonPropertyDescription (which might be in some cases unnecessary but nevertheless are valid according to JLS), what is the use-case where this is causing issue(s) ?

My understanding is that @JsonPropertyDescription is used for generating json schema. A simple use-case in form of:

    public static class Description {

        @JsonProperty("description")
        @JsonPropertyDescription("\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435")
        public String description;

    }

    public static void main(String[] args) throws JsonProcessingException {
        ObjectMapper mapper = new ObjectMapper();
        SchemaFactoryWrapper visitor = new SchemaFactoryWrapper();
        mapper.acceptJsonFormatVisitor(Description.class, visitor);
        JsonSchema jsonSchema = visitor.finalSchema();
        System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(jsonSchema));
    }

would yield

{
  "type" : "object",
  "id" : "urn:jsonschema:org:jsonschema2pojo:Schema:Description",
  "properties" : {
    "description" : {
      "type" : "string",
      "description" : "Описание"
    }
  }

As can be seen escaped characters have been unescaped.

antonovdmitriy commented 4 years ago

it is very easy to explain. A simple example. We use a schema to validate an incoming message from an external system, and also use schema-based data model generation when building a project. According to the scheme, after generation, we get a set of objects that we use in the project as a dto. Everything works fine. But a big minus. When we look at the source code of our project (generated after build and submitted to version control), we get a misunderstanding of these escaped characters. Of course, English is the main language of development, but it is worth considering regional peculiarities. There is no need to escape characters. All modern compilers will do their job just fine without escaping.

unkish commented 4 years ago

So issue is rather 'cosmetic'.

As escaping is done by codemodel the way I see it there are several ways to deal with this:

  1. Ignore @JsonPropertyDescription when viewing source code (description is also present in javadoc which IDE's should have no problems to display)
  2. Submit an issue/PR to codemodel. In case it get's fixed/merged submit an issue/PR to jsonschema2pojo to use new codemodel. In case it get's fixed/merged use new version of generator.
  3. Modify Jackson2Annotator to work around escaping (should be no problem as code is open source). If feature would be in high demand maybe author would consider merging PR.

One possible way to customize behavior could be following: Create a modified version of JStringLiteral eg.:

import org.apache.commons.lang3.text.translate.UnicodeUnescaper;

class JEncodingAwareStringLiteral extends JExpressionImpl {

        private final UnicodeUnescaper unicodeUnescaper = new UnicodeUnescaper();

        public final String str;
        public final String encoding;

        JEncodingAwareStringLiteral(String what, String encoding) {
            this.str = what;
            this.encoding = encoding;
        }

        @Override
        public void generate(JFormatter f) {
            String input = JExpr.quotify('"', str);
            if (Charset.forName(encoding) == StandardCharsets.UTF_8) {
                input = unicodeUnescaper.translate(input);
            }
            f.p(input);
        }
    }

Modify all applicable places such that they'd be utilising JEncodingAwareStringLiteral ie.

        if (propertyNode.has("description")) {
            field.annotate(JsonPropertyDescription.class).param(
                    "value",
                    new JEncodingAwareStringLiteral(propertyNode.get("description").asText(), getGenerationConfig().getOutputEncoding()));
        }
antonovdmitriy commented 4 years ago

There is already an issue in the old codemodel repository https://github.com/javaee/jaxb-codemodel/issues/30 it seems that library was relocated to project jaxb-ri and jsonschema2pojo uses old library image image

Are you planning to move to this library? https://github.com/eclipse-ee4j/jaxb-ri image image

and there.. problem is still actual https://github.com/eclipse-ee4j/jaxb-ri/blob/3454cff57aae61545975911d178946373ea89cd7/jaxb-ri/codemodel/codemodel/src/main/java/com/sun/codemodel/JExpr.java#L223

joelittlejohn commented 4 years ago

I quite like what @unkish has suggested re JEncodingAwareStringLiteral. It doesn't seem too intrusive and I think there's little hope of fixing this upstream.

joelittlejohn commented 3 years ago

The comment from the CodeModel code is interesting:

                // However, various tools are so broken around this area,
                // so just to be on the safe side, it's better to do
                // the escaping here (regardless of the actual file encoding)