Closed rikkarth closed 6 months ago
@rikkarth Sorry for taking so long to complete the review. Comments added.
Fixed issue with double array breaking JSONTokener.nextValue() which now forwards jsonParserConfiguration object when creating a new JSONArray.
Changed input validation in JSONArray, removing collection of complete input in favor of leveraging x.syntaxError().
@rikkarth Please merge from master to pick up the latest changes:
@rikkarth Looks like there is a regression when parsing array elements with non-string simple values. Please add this unit test:
@Test
public void shouldHandleNumericArray() {
String expected = "[10]";
JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
assertEquals(expected, jsonArray.toString());
}
This test passes when strict mode is false, but fails with this output when strict mode is true:
Similar problems occur in strict mode when the array elements are true/false, and null.
Also boolean values are producing the same behaviour, I will fix it.
I changed how I validated numbers, boolean and string values using JSONObject.stringToValue existing logic, I've also added what I believe a few, more robust, test cases to prevent more regressions.
I hope the implementation looks cleaner now.
Sorry I didn't catch this earlier.
/**
* Parses unquoted text from the JSON input. This could be the values true, false, or null, or it can be a number.
* Non-standard forms are also accepted. Characters are accumulated until the end of the text or a formatting
* character is reached.
*
* @param c The starting character.
* @return The parsed object.
* @throws JSONException If the parsed string is empty.
*/
private Object parsedUnquotedText(char c, boolean strictMode) {
StringBuilder sb = new StringBuilder();
while (c >= ' ' && ",:]}/\\\"[{;=#".indexOf(c) < 0) {
sb.append(c);
c = this.next();
}
if (!this.eof) {
this.back();
}
String string = sb.toString().trim();
if (string.isEmpty()) {
throw this.syntaxError("Missing value");
}
Object stringToValue = JSONObject.stringToValue(string);
return strictMode ? getValidNumberOrBooleanFromObject(stringToValue) : stringToValue;
}
private Object getValidNumberOrBooleanFromObject(Object value) {
if (value instanceof Number || value instanceof Boolean) {
return value;
}
throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
}
null without quotes is a valid JSON value, just like true and false (see https://json.org). Strict mode should allow this value when it appears without quotes. Please add this test case for coverage:
@Test
public void allowNullInStrictMode() {
String expected = "[null]";
JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
assertEquals(expected, jsonArray.toString());
}
Done as requested. Now it was simple as adding it to the this method:
private Object getValidNumberBooleanOrNullFromObject(Object value) {
if (value instanceof Number || value instanceof Boolean || value.equals(JSONObject.NULL)) {
return value;
}
throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
}
@rikkarth, This string fails in strict mode because there are chars after the end of an empty array:
String s = "[[]]";
This string is allowed in strict mode because the chars at the end are not validated:
String s = "[1],";
You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.
@rikkarth, This string fails in strict mode because there are chars after the end of an empty array:
String s = "[[]]";
This string is allowed in strict mode because the chars at the end are not validated:String s = "[1],";
You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.
Yes you are absolutely right. I will come back to you with a more solid, robust solution and design.
Thank you for your patience.
I'm near a solution for this I will share my results and implementation soon.
Mr @stleary,
After a few days of serious research of the codebase and work I believe I came up to a nice solution.
Ideally I'd like to present you with the changes I performed over a call, but I'll try to do my best through here.
Will present it in the next comment.
I transformed the "endless loop" into a more predictive controlled recursive call.
This was highly necessary for this feature since the implementation of a stable, working strict mode required touching a lot of key points in the JSONArray and JSONTokener validation process.
before and after of JSONArray main constructor
I did my best to maintain the codebase without regressions, during this process it was imperative that I made some "naming" changes in order to achieve my vision for this implementation.
example of variable naming change from nextChar
to cursor
Strict Mode validations in JSONArray
were improved. Two new key implementations were added to the JSONTokener
:
arrayLevel
smallCharMemory
.new fields in JSONTokener
Small Char Memory is a small List of Characters which is designed to only contain 2 characters, this was extremely necessary because I required a wider "predictive" tool to perform certain validations and achieve a stronger and more stable "strict mode".
I decided to use the List implementation because it provides the exact API I needed to enforce the smallest memory footprint in this implementation without losing the tools the List implementation provides.
By adopting the List we can also make this "memory" larger in the future, meaning we can broaden the "predictive" power of this tool with very little change.
Because char
holds no reference since it's primitive.
I had to make sure I was comparing the same reference of a character without reinventing the wheel in order to update the smallCharMemory
with the right reference (not character).
Knowing if I was at "level 0" was crucial to understand. I had to make sure if I was evaluating a (1.) double(or deeper) array case or (2.) a single/base level array case.
Every time nextValue()
is called the arrayLevel
goes up. We can replace this with AtomicInteger
in the future if there is a need, but at the moment I didn't find a use-case that warranted using it.
The array level can only be accessed with JSONTokener.getArraylevel
This is used to find rogue characters after an array at base level ( arrayLevel == 0 )
.
Example_1: [],
Example_2: []asd
Example_3: []]
parseTokener
, higher predictability and control over JSONArray construction.arrayLevel
and smallCharMemory
adding more predict ability to the implementation and future use-cases.smallCharMemory
. (Around 4 bytes for two Characters + ArrayList overhead)Since we are in "Maintenance Mode" it's important to keep things maintainable, so I'd like to propose some recommendations for the future. I can create the tickets for each point to keep the PRs focused and small for easier review. I only made extremely necessary changes in this PR with hopes that I can apply more "cosmetic" changes in future PRs with the goal to improve the maintainability of this project.
Semantic refactor in JSONArray, which doesn't aim to change any of the existing functionality but only to clean the implementation and make it more maintainable.
Increase JSONArray internal immutability by removing all unjustified mutability from the class.
JSONArray Helper class to decrease the cognitive complexity of the implementation.
Increase the performance of JSONArray through increasing the eligibility of reference garbage collection by improving the scope of certain references.
@rikkarth Please see #891. Let's pause this review until the unit tests have been updated to give a better view of strict mode behavior.
Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @Test with @Ignore.
You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.
I will review all this points and will come back to you with new feedback.
At first glance, this should be relatively easy to achieve/fix.
Thank you for this analysis.
@stleary
Update: I fixed all the test cases in JSONTokener apart from the last which I require more feedback:
The issue was, I didn't perform the most simple of validations which checks for end of file after finding ']' at 2nd, which I included now. After doing it, all tests pass accordingly.
As for this test case
@Test
public void testValid() {
// ...
checkValid("1 2", String.class);
}
throws JSONException
which is indeed valid, since you manually set strictMode
true in JSONParserConfiguration
.
The test case value is not inside an array; The test case value is not inside quotes; The test case value is not valid according to strictMode guidelines (or is it?)
What should I do with this case?
org.json.JSONException: Value '1 2' is not surrounded by quotes at 3 [character 4 line 1]
This test is failing because the expected testCase is actually malformed, strictMode working accordingly. Not a regression.
Test name: toJSONObjectToJSONArray
inline
I'm not sure if you'd like me to keep the tests like they currently are (even if malformed).
In the mean time I will isolate the tests as requested. I will correct and externalize the test cases into JSON files which I believe will be more compliant since modern IDEs check for syntax errors for us in cases like this.
Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.
All of the points you've highlighted were fixed. There are a few cases (e.g checkValid("1 2", String.class);
) where I don't exactly know what I should do there, and will require some more feedback on this.
I removed the TODO's, comments, etc that you provided in strict-mode-unit-tests as I fixed each issue.
I left some of the "important" TODO's the one's you wrote "TBD later" or "will revert later" so you could easily navigate back to them later.
There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.
Fixed.
There are also several errors in JSONTokenerTest that suggest a regression.
Fixed.
You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.
I merged them into this branch to keep a linear history of what was affected in this PR.
I added this TODO's because I strongly believe the test cases here are super bloated and could be hugely simplified.
I will make my case in a future PR, leaving this here for reference.
@rikkarth Nice work.
The 1 2
test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained.
Will review the code, and add some comments.
@rikkarth Nice work. The
1 2
test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained. Will review the code, and add some comments.
The test for this use-case passes in strictMode=false
just not in strictMode=true
which follows JSON rules.
The tokener is trying to understand if the value is enclosed within an array or object.
And then the tokener in strictMode sees this value is not a Number, null or boolean ... it's a string without quotes.
The question is: Why should this test pass in strictMode? The only answer I could get was - it shouldn't.
The return when strictMode=false
.
checkValid("1 2", String.class); //strictMode false
output
1 2
Could the label for this PR be updated? It's no longer pending redesign but pending review.
Thank you.
Could the label for this PR be updated? It's no longer pending redesign but pending review.
I think the label still applies. The goal of issue #887 was just to catch trailing chars at the end of the JSON doc. There might be a simpler solution that hasn't been found yet. For example, adding properties to the tokener does not seem like the right direction, or the way that JSONArray() was rewritten.
Fair enough.
Did you have time to read my previous comment?
It is the last point I have to complete the points you've defined previously.
I can make this last test pass, I'm just wondering why should this test pass in strict mode?
Do you have an opinion on it?
Did you have time to read my previous comment? It is the last point I have to complete the points you've defined previously. I can make this last test pass, I'm just wondering why should this test pass in strict mode? Do you have an opinion on it?
Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.
Did you have time to read my previous comment? It is the last point I have to complete the points you've defined previously. I can make this last test pass, I'm just wondering why should this test pass in strict mode? Do you have an opinion on it?
Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.
Yes, the error is exactly that. Quote-delimited strings, which is coherent with the design of strictMode.
If you change the test and put quotes around the string, while strictMode is true, then it passes the test.
Please check screenshots bellow.
What problem does this code solve? Complete the implementation of strict mode for JSONArray
Does the code still compile with Java6? Yes
Risks Low
Changes to the API? No
Will this require a new release? No
Should the documentation be updated? No
Does it break the unit tests? No, new unit tests were added
Was any code refactored in this commit? No
Review status APPROVED
Starting 3-day comment window
Cleaned up strictMode true flag as instructed.
Description
If in strict mode, and having reached the end of the JSONArray construction life-cycle with no errors, we then perform an input validation.
In this validation we explicitly check if (1.) the last bracket was found and (2.) if more characters exist after that.
If the next character is not the end of the array (0), then it means we have more characters after the final closing square bracket, so we throw an error message informing the array is not compliant and we give back the complete input to the user for easy debugging.
To ensure this logic doesn't affect 2D arrays I also created a new test for it.
From the examples provided in the issue created: https://github.com/stleary/JSON-java/issues/887
Additional Note
Since the logic is fully encapsulated inside
validateInput()
which is only run if strictMode is true, it is guaranteed that previous implementations are not affected when strictMode is false and that this implementation is modular and can be maintained, improved or removed by simply removingvalidateInput()
.Test cases added