quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.82k stars 2.69k forks source link

Qute Character Escapes for Json Content #43369

Open ogomezdi opened 2 months ago

ogomezdi commented 2 months ago

Description

It'll nice to have the same behavior for Json template for character escapes like the one we have for XML content explained here https://quarkus.io/guides/qute-reference#character-escapes allowing users to escape content we want to include, for example, as string in an attribute value, for those values that contain quotes, new line, ... characters. Despite there is a workaround by creating a @TemplateExtension or @TemplateData to access encode static functions available in Apache Commons Texts and others, but I think this must be a core functionality.

Implementation ideas

Use the same approach for escaping XML content, may be including in quarkus.qute.escape-content-types property "application/json" joined to the characters escape policy. Other easy approach will be include in "Built-in Template Extensions" one for Strings, allowing encoding and unencoding options.

quarkus-bot[bot] commented 2 months ago

/cc @mkouba (qute)

mkouba commented 2 months ago

Escaping is implemented as a io.quarkus.qute.ResultMapper that is applied to all output expressions in a template before the evaluated string value is written to the output. See also io.quarkus.qute.HtmlEscaper. You can register a custom ResultMapper with the EngineBuilder#addResultMapper(ResultMapper) method.

However, if you need more context (string in an attribute value) you can't apply the escaping globally...

Despite there is a workaround by creating a @TemplateExtension or @templatedata to access encode static functions available in Apache Commons Texts and others, but I think this must be a core functionality.

I'm not quite sure this should be a core functionality.

Use the same approach for escaping XML content, may be including in quarkus.qute.escape-content-types property "application/json" joined to the characters escape policy.

If you want to apply the default escaping to json files then just add application/json to the list of the media types in the quarkus.qute.escape-content-types config property :shrug: . Keep in mind that Qute has to be able to detect the content-type of a template file (which is not always the case; depends on use case).

ogomezdi commented 2 months ago

Let me try to explain better. Nowadays Character Escapes only apply to content included in a HTML or XML template, but if we use a template to generate a Json output, and we want to assign string values to a key in the Json and the values include quotes, newlines, ... the resulting output will not create a valid json. For example you have a template like this: { "key": "{variable}" } And you bind variable which has a doble quote inside, imagine hello "world". You'll have this output: { "key": "hello "world"" } Which is not a valid json, that must be: { "key": "hello \"world\"" } I hope the explanation helps understanding the issue.

mkouba commented 2 months ago

Yes, I do understand your example ;-)

My point is that you probably can't apply a global escaping to a JSON template because you need some "context" in order to apply the correct escaping. For example, in a template like { "key": "{variable}", "key2": {anotherVar} } where variable=hello "world" and anotherVar="hello world" you can't simply escape all double quotes in all output expressions, right?

So a template extension that would escape quotes is IMO the way to go: { "key": "{variable.asJsonValue}", "key2": {anotherVar} }:

@TemplateExtension
class MyExtensions {

    static String asJsonValue(String val) { 
       // escape quotes etc.
    }
}
ogomezdi commented 2 months ago

This is the workaround I've mentioned in the issue description. But, What I'm specting is to have something similar to the approach applied for XML, I mean https://quarkus.io/guides/qute-reference#character-escapes where the variable content is automatically escaped in case we're in a XML/HTML template, so if you use it in this, portion of template <tag>{variable}</tag>, the qute engine automatically apply de escape XML rules to variable without needing to execute any method or extension explicitly. So for me translating this behavior when users try to obtain a JSON output from qute, also including the rawoption if possible :-), is the aim of this request.

mkouba commented 2 months ago

I think that I understand your use case but the question is: should we apply escaping for the whole JSON template? If so, what would be the rules? For XML/HTML it's simple. For JSON I'm not so sure...

ogomezdi commented 2 months ago

I'm not sure to understand what do you mean when talking about the "whole JSON template", if you mean in any instance of variables to render in a template, my answer will be yes, as users will be able to apply .raw in places where they need to avoid this behavior. Regarding the rules to apply may be we can rely on other implementations like https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/StringEscapeUtils.html#escapeJson(java.lang.String)

mkouba commented 2 months ago

I'm not sure to understand what do you mean when talking about the "whole JSON template", if you mean in any instance of variables to render in a template, my answer will be yes, as users will be able to apply .raw in places where they need to avoid this behavior. Regarding the rules to apply may be we can rely on other implementations like https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/StringEscapeUtils.html#escapeJson(java.lang.String)

In that case, somethig like this should work:

public class JsonEscaper implements ResultMapper {

    @Override
    public boolean appliesTo(Origin origin, Object result) {
        if (result instanceof RawString) {
            return false;
        }
        Optional<Variant> variant = origin.getVariant();
        if (variant.isPresent()) {
            return variant.getContentType().startsWith("application/json");
        }
        return false;
    }

    @Override
    public String map(Object result, Expression expression) {
        return StringEscapeUtils.html.escapeJson(result.toString());
    }

and then register the mapper:

class MyBean {

    void configureEngine(@Observes EngineBuilder builder) {
        builder.addResultMapper(new JsonEscaper());
    }
}

I don't think we want to add commons-text as a dependency. However, we could implement a simple implementation similar to io.quarkus.qute.HtmlEscaper.

Feel free to send a pull request ;-).

ogomezdi commented 2 months ago

I've found a easier implementation for encoding in other library that can fit in the way you've mentioned, may be something like this:

public static String escapeJson(String value) {
    if (value == null)
      return "";

    StringBuilder b = new StringBuilder();
    for (char c : value.toCharArray()) {
      if (c == '\r')
        b.append("\\r");
      else if (c == '\n')
        b.append("\\n");
      else if (c == '\t')
        b.append("\\t");
      else if (c == '"')
        b.append("\\\"");
      else if (c == '\\')
        b.append("\\\\");
      else if (c == ' ')
        b.append(" ");
      else if (isWhitespace(c)) {
        b.append("\\u"+JsonEscaper.padLeft(Integer.toHexString(c), '0', 4));
      } else if (((int) c) < 32)
        b.append("\\u" + JsonEscaper.padLeft(Integer.toHexString(c), '0', 4));
      else
        b.append(c);
    }
    return b.toString();
  }
  public static boolean isWhitespace(int ch) {
    return JsonEscaper.existsInList(ch, '\u0009', '\n', '\u000B','\u000C','\r','\u0020','\u0085','\u00A0',
        '\u1680','\u2000','\u2001','\u2002','\u2003','\u2004','\u2005','\u2006','\u2007','\u2008','\u2009','\u200A',
        '\u2028', '\u2029', '\u202F', '\u205F', '\u3000');
  }
  public static boolean existsInList(int value, int... array) {
    for (int i : array)
      if (value == i)
        return true;
    return false;
  }
  public static String padLeft(String src, char c, int len) {
    StringBuilder s = new StringBuilder();
    for (int i = 0; i < len - src.length(); i++)
      s.append(c);
    s.append(src);
    return s.toString();
  }

One more question, will this mapper be included in the default engine builder like io.quarkus.qute.HtmlEscaper in the io.quarkus.qute.runtime.EngineProducer constructor?

And finally, I think there is an error in the code you've proposed return variant.getContentType().startsWith("application/json") must be return variant.get().getContentType().startsWith("application/json")

mkouba commented 2 months ago

Looks good.

isWhitespace(int ch) could be probably replaced with java.lang.Character.isWhitespace(int)?

A perf tip - if possible use the initial capacity for the StringBuilder (e.g. src.lenght() + len in padLeft()).

One more question, will this mapper be included in the default engine builder like io.quarkus.qute.HtmlEscaper in the io.quarkus.qute.runtime.EngineProducer constructor?

It depends. I mean if you register it with builder.addResultMapper(new JsonEscaper()) then it will live alongside the HtmlEscaper.

And finally, I think there is an error in the code you've proposed return variant.getContentType().startsWith("application/json") must be return variant.get().getContentType().startsWith("application/json")

:+1:

ogomezdi commented 2 months ago

A perf tip - if possible use the initial capacity for the StringBuilder (e.g. src.lenght() + len in padLeft()).

May be easy for padLeft() use -> String.format("%4s", Integer.toHexString(c)).replace(' ','0') So final code should be:

public static String escapeJson(String value) {
    if (value == null)
      return "";

    StringBuilder b = new StringBuilder();
    for (char c : value.toCharArray()) {
      if (c == '\r')
        b.append("\\r");
      else if (c == '\n')
        b.append("\\n");
      else if (c == '\t')
        b.append("\\t");
      else if (c == '"')
        b.append("\\\"");
      else if (c == '\\')
        b.append("\\\\");
      else if (c == ' ')
        b.append(" ");
      else if (java.lang.Character.isWhitespace(c)) {
        b.append("\\u"+String.format("%4s", Integer.toHexString(c)).replace(' ','0'));
      } else if (((int) c) < 32)
        b.append("\\u"+String.format("%4s", Integer.toHexString(c)).replace(' ','0'));
      else
        b.append(c);
    }
    return b.toString();
  }

Avoid java.lang.Character.isWhitespace if explicitly import