Open parker-mar opened 1 year ago
@parker-mar Thanks for bringing this up and for coding up the comparison of behavior, that is really helpful. I think you raise a good point, although the example is perhaps not great. Parsing a string of digits with leading zeros, no surrounding quotes, and then calling getString() without first confirming the type is just asking for trouble, in my opinion.
Do you think the behavior should be reverted, i.e. retain the leading zeros and interpret the value as a string? @johnjaylward What do you think?
This library has never done unexpected conversions with the exception of unquoted strings.
Given the following sample json document:
{
"string1" : "000123",
"string2" : "abc123",
"string3" : abc123,
"number1" : 123,
"number2" : 000123
}
Before the change, I believe all of those values would have been stored internally in JSONObject as strings except number1
.
After the change I would expect that both number1
and number2
to be stored as numbers (and have the same value 123
) and all the other values to be stored as strings.
If after the change, string1
is being stored internally as a number, I would say that is a bug that should be corrected. Explicit strings in the JSON document should always be stored as the exact string value.
As a followup, a json document like in the example provided by @parker-mar ({"personId":0123}
) is technically not valid JSON. So it would be up to the parser on how it wants to handle it. All of the handling provided in the example are "valid" including the current way we are handling it now.
My real answer would be to have whatever is generating {"personId":0123}
fix their bug and output valid JSON. e.g. {"personId":"0123"}
In testing last night, it seemed to me that {"personId":0123}
using 20230618 results in the string value "0123", but 20231013 yields numeric 123. Considering that #783 was only intended to sync up optLong()
and getLong()
, I would consider this an unintended regression. Might affect XML as well. I will dig into it more this weekend, but I think we should consider reverting the code that caused it, hopefully without impairing the original goal of the PR.
Hi, thanks for the prompt responses!
Parsing a string of digits with leading zeros, no surrounding quotes, and then calling getString() without first confirming the type is just asking for trouble, in my opinion.
Indeed it was not a great example, unfortunately it happened in our case that we relied on this pre-existing behavior for JSON exchange of 8-digit person IDs, which caused runtime failures for the consumer that used getString()
to read the ID.
The problem may recur in the future if Java 15 text blocks become more popular (e.g. https://stackoverflow.com/a/61432423):
String personId = "00701234";
JSONObject jsonPayload = new JSONObject("""
{
"name": "Parker",
"id": %s
}""".formatted(personId));
jsonPayload.getString("id"); // Throws exception
Do you think the behavior should be reverted, i.e. retain the leading zeros and interpret the value as a string?
I understand that the parser is designed to accept as input more than the generator https://github.com/stleary/JSON-java/issues/465#issuecomment-476050297, but it wasn't entirely clear to me if the purpose of trimming the leading 0s was to handle octal parsing. If we're looking for reasonable defaults "past" the JSON spec, I think retaining the leading 0s and interpreting the value as a string is a safer/more explicit transformation than trimming for octal values. I also think it's a more common use case, besides that it would maintain the previous behavior.
Considering that https://github.com/stleary/JSON-java/pull/723 was only intended to sync up optLong() and getLong()
I think you may have meant to tag https://github.com/stleary/JSON-java/pull/783.
String personId = "00701234"; JSONObject jsonPayload = new JSONObject(""" { "name": "Parker", "id": %s }""".formatted(personId));
This is a very bad way to generate a JSONObject. This is NOT recommended for the same reason that it's not recommended to build SQL queries using string building. It's very easy to break the expected syntax and have things break, worst case become a security issue.
If using this library, the recommended way to build this object would be:
String personId = "00701234";
JSONObject jsonPayload = new JSONObject()
.put("name", "Parker")
.put("id", personId);
Any type of payload that needs to be exchanged, even internally, that has very specific syntax rules should use the OOTB builders available.
I would consider this an unintended regression.
I disagree. These test cases were explicitly in the tests provided as part of the PR. I understood this to be an objective of the PR.
Any type of payload that needs to be exchanged, even internally, that has very specific syntax rules should use the OOTB builders available.
Right. I also prefer the use of the builders so that we rely on the Java language type system to supply the correct value types, data payload or not.
Still, I can imagine some without the same foresight may turn to using the JSONObject constructor because of the ease of reading code snippets beyond even the simplicity of the OOTB builders, e.g. Text Blocks, which is in a way a primary motivation of that feature:
In Java, embedding a snippet of HTML, XML, SQL, or JSON in a string literal "..." usually requires significant editing with escapes and concatenation before the code containing the snippet will compile. The snippet is often difficult to read and arduous to maintain. (https://openjdk.org/jeps/355)
Make it easier for developers to: express sequences of characters in a readable form, free of Java indicators (https://openjdk.org/jeps/326)
@johnjaylward It's my bad, I missed this behavior change, should have been paying closer attention to the PRs.
@parker-mar @johnjaylward @stleary
I have tested that, if the value in JSON is a string, then the JSON object always retains a string value. If the user wants to get a number out of the string, then JSON Object helps by lifting the load of parsing string to number. If the value is parsable to a number, JSON object would do that and return.
If the JSON value is without quotes, and is an intended number, then it is expected that the value will be a legitimate number. On such values, when get or opt long/int is called, if the number is valid, that gets returned and of course, without the leading zeros.
If the value is intended to be a number, and is not a legitimate number, then JSON object returns 0 for them.
I have tested these behaviours in the following code.
String jsonWithNumberAndText = "{\"Key1\":012,\"Key2\":\"012\",\"Key3\":aa12}";
JSONObject jsonObject1 = new JSONObject(jsonWithNumberAndText);
String key2 = jsonObject1.getString("Key2");
long key2OptLong = jsonObject1.optLong("Key2");
long key2GetLong = jsonObject1.optLong("Key2");
long key1OptLong = jsonObject1.optLong("Key1");
long key1GetLong = jsonObject1.optLong("Key1");
long key3OptLong = jsonObject1.optLong("Key3");
long key3GetLong = jsonObject1.optLong("Key3");
System.out.printf("Key2::String::%s::Get Long::%d::Opt Long::%d%n", key2, key2GetLong,key2OptLong);
System.out.printf("Key1::Get Long::%d::Opt Long::%d%n", key1GetLong,key1OptLong);
System.out.printf("Key3::Get Long::%d::Opt Long::%d%n", key3GetLong,key3OptLong);
/* Output
Key2::String::012::Get Long::12::Opt Long::12
Key1::Get Long::12::Opt Long::12
Key3::Get Long::0::Opt Long::0
*/`
For the number fields, calling getString on them throws an exception. This is where probably we have a scope if improvement. JSON Object may store the original value, and getString would return the original value (it could be a legitimate number, a number with leading zeros, or a set of non-number characters).
@rudrajyotib No worries, you did nothing wrong. It is my responsibility to provide the guardrails so that contributors don't accidentally break existing projects with a code change, so this is on me.
This issue may have turned up in #852. Investigating...
We are seeing this behavior, or at least related behavior, as well, but only after updating from 20231013 to 20240205.
We extensively use the XML to JSON transformation offered by this library, and see the following change happening:
Given an XML:
...
<datatypes>
<telephone>0123456789</telephone>
<amount>0.1230</amount>
<boolean>true</boolean>
</datatypes>
...
and our code:
JSONObject o = XML.toJSONObject(xml, keepStrings);
Behavior in 20231023 with keepStrings
on false
:
{"amount":0.123,"boolean":true,"telephone":"0123456789"}
Behavior in 20240205 with keepStrings
on false
:
{"amount":0.123,"boolean":true,"telephone":123456789}
As phone numbers in our country (as in most of Europe I believe) start with a 0, it was very helpful that JSON-java understood that and treated strings with leading zeroes as a string instead of an integer. This has changed and for us, functionally, this is indeed regression as suggested before.
We do not want the keepStrings
on true
: booleans and regular integers should actually be converted into booleans and integers. So for now we can not update JSON-java to the latest version, and that's of course not a desirable situation.
@bartlaarhoven Thanks for posting. This issue will be resolved in the next 48 hours. The new behavior will either be reverted, or changed to opt-in, depending on how the code looks. A new version will also be released.
Hello there! I am curious about the new behavior introduced in 20231013 (ref #783), where the JSONObject constructor trims 0-led integers and converts them to int type instead of String type (which was the type in 20230618).
While I can understand the rationale behind trimming, the change in behavior could be problematic for things like IDs with leading 0s, which may have previously relied on the JSONObject constructor preserving the value as String type, only to have it converted to int, and thus failing a subsequent call to JSONObject
getString("id")
because of type mismatch.Comparing with other Java JSON libraries, org.json seems to behave in a way that's potentially unsafe as it transforms the input and therefore "hides" the error, versus the other libraries which take a safer approach by throwing an error on input (javax.json, Jackson) or treating the input as "pseudo"-json and letting the getter decide the type (Gson).