stleary / JSON-java

A reference implementation of a JSON package in Java.
http://stleary.github.io/JSON-java/index.html
Other
4.54k stars 2.56k forks source link

Behavior of JSONObject constructor trimming 0-led integers and converting them to int type instead of String type #826

Open parker-mar opened 1 year ago

parker-mar commented 1 year ago

Hello there! I am curious about the new behavior introduced in 20231013 (ref #783), where the JSONObject constructor trims 0-led integers and converts them to int type instead of String type (which was the type in 20230618).

While I can understand the rationale behind trimming, the change in behavior could be problematic for things like IDs with leading 0s, which may have previously relied on the JSONObject constructor preserving the value as String type, only to have it converted to int, and thus failing a subsequent call to JSONObject getString("id") because of type mismatch.

Comparing with other Java JSON libraries, org.json seems to behave in a way that's potentially unsafe as it transforms the input and therefore "hides" the error, versus the other libraries which take a safer approach by throwing an error on input (javax.json, Jackson) or treating the input as "pseudo"-json and letting the getter decide the type (Gson).

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.JsonParser;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;

import javax.json.Json;
import java.io.StringReader;

class MyTest {
    @Test
    void myTest() throws JsonProcessingException {
        String personId = "0123";

        // org.json transforms on construction 
        JSONObject j1 = new JSONObject("{personId: " + personId + "}");
        System.out.println(j1.getString("personId")); // Throws exception

        // javax.json throws error on construction
        javax.json.JsonObject j2 = Json.createReader(new StringReader("{\"personId\": " + personId + "}")).readObject(); 
        System.out.println(j2.getString("personId"));

        // gson treats it as pseudo-json and leaves it to the getter to decide: getAsInt will trim, getAsString will not
        com.google.gson.JsonObject j3 = JsonParser.parseString("{\"personId\": " + personId + "}").getAsJsonObject();
        System.out.println(j3.get("personId").getAsString());

        // jackson throws error on construction
        JsonNode jsonNode = new ObjectMapper().readTree("{\"personId\": " + personId + "}");
        System.out.println(jsonNode.get("personId").asText());
    }
}
stleary commented 1 year ago

@parker-mar Thanks for bringing this up and for coding up the comparison of behavior, that is really helpful. I think you raise a good point, although the example is perhaps not great. Parsing a string of digits with leading zeros, no surrounding quotes, and then calling getString() without first confirming the type is just asking for trouble, in my opinion.

Do you think the behavior should be reverted, i.e. retain the leading zeros and interpret the value as a string? @johnjaylward What do you think?

johnjaylward commented 1 year ago

This library has never done unexpected conversions with the exception of unquoted strings.

Given the following sample json document:

{
  "string1" : "000123",
  "string2" : "abc123",
  "string3" : abc123,
  "number1" : 123,
  "number2" : 000123
}

Before the change, I believe all of those values would have been stored internally in JSONObject as strings except number1.

After the change I would expect that both number1 and number2 to be stored as numbers (and have the same value 123) and all the other values to be stored as strings.

If after the change, string1 is being stored internally as a number, I would say that is a bug that should be corrected. Explicit strings in the JSON document should always be stored as the exact string value.

johnjaylward commented 1 year ago

As a followup, a json document like in the example provided by @parker-mar ({"personId":0123}) is technically not valid JSON. So it would be up to the parser on how it wants to handle it. All of the handling provided in the example are "valid" including the current way we are handling it now.

My real answer would be to have whatever is generating {"personId":0123} fix their bug and output valid JSON. e.g. {"personId":"0123"}

stleary commented 1 year ago

In testing last night, it seemed to me that {"personId":0123} using 20230618 results in the string value "0123", but 20231013 yields numeric 123. Considering that #783 was only intended to sync up optLong() and getLong(), I would consider this an unintended regression. Might affect XML as well. I will dig into it more this weekend, but I think we should consider reverting the code that caused it, hopefully without impairing the original goal of the PR.

parker-mar commented 1 year ago

Hi, thanks for the prompt responses!

Parsing a string of digits with leading zeros, no surrounding quotes, and then calling getString() without first confirming the type is just asking for trouble, in my opinion.

Indeed it was not a great example, unfortunately it happened in our case that we relied on this pre-existing behavior for JSON exchange of 8-digit person IDs, which caused runtime failures for the consumer that used getString() to read the ID.

The problem may recur in the future if Java 15 text blocks become more popular (e.g. https://stackoverflow.com/a/61432423):

String personId = "00701234";
JSONObject jsonPayload = new JSONObject("""
    {
        "name": "Parker",
        "id": %s
    }""".formatted(personId));

jsonPayload.getString("id"); // Throws exception

Do you think the behavior should be reverted, i.e. retain the leading zeros and interpret the value as a string?

I understand that the parser is designed to accept as input more than the generator https://github.com/stleary/JSON-java/issues/465#issuecomment-476050297, but it wasn't entirely clear to me if the purpose of trimming the leading 0s was to handle octal parsing. If we're looking for reasonable defaults "past" the JSON spec, I think retaining the leading 0s and interpreting the value as a string is a safer/more explicit transformation than trimming for octal values. I also think it's a more common use case, besides that it would maintain the previous behavior.

Considering that https://github.com/stleary/JSON-java/pull/723 was only intended to sync up optLong() and getLong()

I think you may have meant to tag https://github.com/stleary/JSON-java/pull/783.

johnjaylward commented 1 year ago
String personId = "00701234";
JSONObject jsonPayload = new JSONObject("""
    {
        "name": "Parker",
        "id": %s
    }""".formatted(personId));

This is a very bad way to generate a JSONObject. This is NOT recommended for the same reason that it's not recommended to build SQL queries using string building. It's very easy to break the expected syntax and have things break, worst case become a security issue.

If using this library, the recommended way to build this object would be:

String personId = "00701234";
JSONObject jsonPayload = new JSONObject()
    .put("name", "Parker")
    .put("id", personId);

Any type of payload that needs to be exchanged, even internally, that has very specific syntax rules should use the OOTB builders available.

johnjaylward commented 1 year ago

I would consider this an unintended regression.

I disagree. These test cases were explicitly in the tests provided as part of the PR. I understood this to be an objective of the PR.

https://github.com/rudrajyotib/JSON-java/blob/b5b9f636ff150fbad9dce3cd3e77a2747ac73190/src/test/java/org/json/junit/JSONObjectNumberTest.java#L26-L29

parker-mar commented 1 year ago

Any type of payload that needs to be exchanged, even internally, that has very specific syntax rules should use the OOTB builders available.

Right. I also prefer the use of the builders so that we rely on the Java language type system to supply the correct value types, data payload or not.

Still, I can imagine some without the same foresight may turn to using the JSONObject constructor because of the ease of reading code snippets beyond even the simplicity of the OOTB builders, e.g. Text Blocks, which is in a way a primary motivation of that feature:

In Java, embedding a snippet of HTML, XML, SQL, or JSON in a string literal "..." usually requires significant editing with escapes and concatenation before the code containing the snippet will compile. The snippet is often difficult to read and arduous to maintain. (https://openjdk.org/jeps/355)

Make it easier for developers to: express sequences of characters in a readable form, free of Java indicators (https://openjdk.org/jeps/326)

stleary commented 1 year ago

@johnjaylward It's my bad, I missed this behavior change, should have been paying closer attention to the PRs.

rudrajyotib commented 1 year ago

@parker-mar @johnjaylward @stleary

I have tested that, if the value in JSON is a string, then the JSON object always retains a string value. If the user wants to get a number out of the string, then JSON Object helps by lifting the load of parsing string to number. If the value is parsable to a number, JSON object would do that and return.

If the JSON value is without quotes, and is an intended number, then it is expected that the value will be a legitimate number. On such values, when get or opt long/int is called, if the number is valid, that gets returned and of course, without the leading zeros.

If the value is intended to be a number, and is not a legitimate number, then JSON object returns 0 for them.

I have tested these behaviours in the following code.

String jsonWithNumberAndText = "{\"Key1\":012,\"Key2\":\"012\",\"Key3\":aa12}";

    JSONObject jsonObject1 = new JSONObject(jsonWithNumberAndText);
    String key2 = jsonObject1.getString("Key2");
    long key2OptLong = jsonObject1.optLong("Key2");
    long key2GetLong = jsonObject1.optLong("Key2");

    long key1OptLong = jsonObject1.optLong("Key1");
    long key1GetLong = jsonObject1.optLong("Key1");

    long key3OptLong = jsonObject1.optLong("Key3");
    long key3GetLong = jsonObject1.optLong("Key3");

    System.out.printf("Key2::String::%s::Get Long::%d::Opt Long::%d%n", key2, key2GetLong,key2OptLong);
    System.out.printf("Key1::Get Long::%d::Opt Long::%d%n",  key1GetLong,key1OptLong);
    System.out.printf("Key3::Get Long::%d::Opt Long::%d%n",  key3GetLong,key3OptLong);

    /* Output
    Key2::String::012::Get Long::12::Opt Long::12
    Key1::Get Long::12::Opt Long::12
    Key3::Get Long::0::Opt Long::0
     */`

For the number fields, calling getString on them throws an exception. This is where probably we have a scope if improvement. JSON Object may store the original value, and getString would return the original value (it could be a legitimate number, a number with leading zeros, or a set of non-number characters).

stleary commented 1 year ago

@rudrajyotib No worries, you did nothing wrong. It is my responsibility to provide the guardrails so that contributors don't accidentally break existing projects with a code change, so this is on me.

stleary commented 9 months ago

This issue may have turned up in #852. Investigating...

bartlaarhoven commented 8 months ago

We are seeing this behavior, or at least related behavior, as well, but only after updating from 20231013 to 20240205.

We extensively use the XML to JSON transformation offered by this library, and see the following change happening:

Given an XML:

...
    <datatypes>
        <telephone>0123456789</telephone>
        <amount>0.1230</amount>
        <boolean>true</boolean>
    </datatypes>
...

and our code:

JSONObject o = XML.toJSONObject(xml, keepStrings);

Behavior in 20231023 with keepStrings on false:

{"amount":0.123,"boolean":true,"telephone":"0123456789"}

Behavior in 20240205 with keepStrings on false:

{"amount":0.123,"boolean":true,"telephone":123456789}

As phone numbers in our country (as in most of Europe I believe) start with a 0, it was very helpful that JSON-java understood that and treated strings with leading zeroes as a string instead of an integer. This has changed and for us, functionally, this is indeed regression as suggested before.

We do not want the keepStrings on true: booleans and regular integers should actually be converted into booleans and integers. So for now we can not update JSON-java to the latest version, and that's of course not a desirable situation.

stleary commented 8 months ago

@bartlaarhoven Thanks for posting. This issue will be resolved in the next 48 hours. The new behavior will either be reverted, or changed to opt-in, depending on how the code looks. A new version will also be released.