fix(#887): complete strictMode for JSONArray

rikkarth commented 7 months ago

Description

If in strict mode, and having reached the end of the JSONArray construction life-cycle with no errors, we then perform an input validation.

In this validation we explicitly check if (1.) the last bracket was found and (2.) if more characters exist after that.

If the next character is not the end of the array (0), then it means we have more characters after the final closing square bracket, so we throw an error message informing the array is not compliant and we give back the complete input to the user for easy debugging.

To ensure this logic doesn't affect 2D arrays I also created a new test for it.

From the examples provided in the issue created: https://github.com/stleary/JSON-java/issues/887

Additional Note

Since the logic is fully encapsulated inside validateInput() which is only run if strictMode is true, it is guaranteed that previous implementations are not affected when strictMode is false and that this implementation is modular and can be maintained, improved or removed by simply removing validateInput().

Test cases added

stleary commented 7 months ago

@rikkarth Sorry for taking so long to complete the review. Comments added.

rikkarth commented 7 months ago

Fixed issue with double array breaking JSONTokener.nextValue() which now forwards jsonParserConfiguration object when creating a new JSONArray.
Changed input validation in JSONArray, removing collection of complete input in favor of leveraging x.syntaxError().

stleary commented 7 months ago

@rikkarth Please merge from master to pick up the latest changes:

stleary commented 7 months ago

@rikkarth Looks like there is a regression when parsing array elements with non-string simple values. Please add this unit test:

    @Test
    public void shouldHandleNumericArray() {
        String expected = "[10]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

This test passes when strict mode is false, but fails with this output when strict mode is true:

Similar problems occur in strict mode when the array elements are true/false, and null.

rikkarth commented 7 months ago

Also boolean values are producing the same behaviour, I will fix it.

rikkarth commented 7 months ago

I changed how I validated numbers, boolean and string values using JSONObject.stringToValue existing logic, I've also added what I believe a few, more robust, test cases to prevent more regressions.

I hope the implementation looks cleaner now.

Sorry I didn't catch this earlier.

Correction - regression when parsing array elements with non-string simple values.

    /**
     * Parses unquoted text from the JSON input. This could be the values true, false, or null, or it can be a number.
     * Non-standard forms are also accepted. Characters are accumulated until the end of the text or a formatting
     * character is reached.
     *
     * @param c The starting character.
     * @return The parsed object.
     * @throws JSONException If the parsed string is empty.
     */
    private Object parsedUnquotedText(char c, boolean strictMode) {
        StringBuilder sb = new StringBuilder();
        while (c >= ' ' && ",:]}/\\\"[{;=#".indexOf(c) < 0) {
            sb.append(c);
            c = this.next();
        }
        if (!this.eof) {
            this.back();
        }

        String string = sb.toString().trim();

        if (string.isEmpty()) {
            throw this.syntaxError("Missing value");
        }

        Object stringToValue = JSONObject.stringToValue(string);

        return strictMode ? getValidNumberOrBooleanFromObject(stringToValue) : stringToValue;
    }

    private Object getValidNumberOrBooleanFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

stleary commented 7 months ago

null without quotes is a valid JSON value, just like true and false (see https://json.org). Strict mode should allow this value when it appears without quotes. Please add this test case for coverage:

    @Test
    public void allowNullInStrictMode() {
        String expected = "[null]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

rikkarth commented 7 months ago

Done as requested. Now it was simple as adding it to the this method:

    private Object getValidNumberBooleanOrNullFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean || value.equals(JSONObject.NULL)) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

stleary commented 7 months ago

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array: String s = "[[]]"; This string is allowed in strict mode because the chars at the end are not validated: String s = "[1],"; You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

rikkarth commented 7 months ago

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array: String s = "[[]]"; This string is allowed in strict mode because the chars at the end are not validated: String s = "[1],"; You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

Yes you are absolutely right. I will come back to you with a more solid, robust solution and design.

Thank you for your patience.

rikkarth commented 7 months ago

I'm near a solution for this I will share my results and implementation soon.

rikkarth commented 7 months ago

Mr @stleary,

After a few days of serious research of the codebase and work I believe I came up to a nice solution.

Ideally I'd like to present you with the changes I performed over a call, but I'll try to do my best through here.

Will present it in the next comment.

rikkarth commented 7 months ago

Switch from "endless for loop" into predictive recursive call

I transformed the "endless loop" into a more predictive controlled recursive call.

This was highly necessary for this feature since the implementation of a stable, working strict mode required touching a lot of key points in the JSONArray and JSONTokener validation process.

before and after of JSONArray main constructor

I did my best to maintain the codebase without regressions, during this process it was imperative that I made some "naming" changes in order to achieve my vision for this implementation.

example of variable naming change from nextChar to cursor

Strict Mode validations in JSONArray were improved. Two new key implementations were added to the JSONTokener:

arrayLevel
smallCharMemory.

Small Char Memory and ArrayLevel

new fields in JSONTokener

Small Char Memory

Small Char Memory is a small List of Characters which is designed to only contain 2 characters, this was extremely necessary because I required a wider "predictive" tool to perform certain validations and achieve a stronger and more stable "strict mode".

Why List?

I decided to use the List implementation because it provides the exact API I needed to enforce the smallest memory footprint in this implementation without losing the tools the List implementation provides.

By adopting the List we can also make this "memory" larger in the future, meaning we can broaden the "predictive" power of this tool with very little change.

Why Character and not primitive char ?

Because char holds no reference since it's primitive.

I had to make sure I was comparing the same reference of a character without reinventing the wheel in order to update the smallCharMemory with the right reference (not character).

Array Level

Knowing if I was at "level 0" was crucial to understand. I had to make sure if I was evaluating a (1.) double(or deeper) array case or (2.) a single/base level array case.

Every time nextValue() is called the arrayLevel goes up. We can replace this with AtomicInteger in the future if there is a need, but at the moment I didn't find a use-case that warranted using it.

The array level can only be accessed with JSONTokener.getArraylevel

Array Level Use Case

This is used to find rogue characters after an array at base level ( arrayLevel == 0 ).

Example_1: [], Example_2: []asd Example_3: []]

PROS & CONS

PROS

Increased performance due to recursive nature of parseTokener, higher predictability and control over JSONArray construction.
Cleaner implementation, some steps defined more clearly.
New Tools: arrayLevel and smallCharMemory adding more predict ability to the implementation and future use-cases.
Improved Strict Mode Implementation
More Unit Tests

CONS

Smaller but higher memory footprint due to the nature of smallCharMemory. (Around 4 bytes for two Characters + ArrayList overhead)
Small Adjustments to existing codebase, could lead to potential regression.

Future Recommendations

Since we are in "Maintenance Mode" it's important to keep things maintainable, so I'd like to propose some recommendations for the future. I can create the tickets for each point to keep the PRs focused and small for easier review. I only made extremely necessary changes in this PR with hopes that I can apply more "cosmetic" changes in future PRs with the goal to improve the maintainability of this project.

JSONArray refactor and cleanup

Semantic Refactor

Semantic refactor in JSONArray, which doesn't aim to change any of the existing functionality but only to clean the implementation and make it more maintainable.

Decrease Mutability / Increase Immutability && Decrease Side-Effects

Increase JSONArray internal immutability by removing all unjustified mutability from the class.

Decrease Cognitive Complexity

JSONArray Helper class to decrease the cognitive complexity of the implementation.

Increase Performance and Memory Footprint

Increase the performance of JSONArray through increasing the eligibility of reference garbage collection by improving the scope of certain references.

stleary commented 6 months ago

@rikkarth Please see #891. Let's pause this review until the unit tests have been updated to give a better view of strict mode behavior.

stleary commented 6 months ago

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @Test with @Ignore.

Most of the broken tests were due to inadvertently forgetting to use double quotes in otherwise valid JSON docs. You can ignore those changes.
A few tests were set to @Ignore because they intentionally used unquoted text. They will be restored later, and are not a concern.
There were a few grey areas, e.g. when an invalid numeric value is supposed to be converted to string, but instead throws an exception. It could be argued either way - should strictMode allow the conversion or not? No decision has been made yet, no action needed at this time.
There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.
There are also several errors in JSONTokenerTest that suggest a regression.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

rikkarth commented 6 months ago

I will review all this points and will come back to you with new feedback.

At first glance, this should be relatively easy to achieve/fix.

Thank you for this analysis.

rikkarth commented 6 months ago

@stleary

Update: I fixed all the test cases in JSONTokener apart from the last which I require more feedback:

The Bug

The issue was, I didn't perform the most simple of validations which checks for end of file after finding ']' at 2nd, which I included now. After doing it, all tests pass accordingly.

As for this test case

@Test
public void testValid() {
    // ...
    checkValid("1 2", String.class);
}

throws JSONException which is indeed valid, since you manually set strictMode true in JSONParserConfiguration.

The test case value is not inside an array; The test case value is not inside quotes; The test case value is not valid according to strictMode guidelines (or is it?)

What should I do with this case?

org.json.JSONException: Value '1 2' is not surrounded by quotes at 3 [character 4 line 1]

rikkarth commented 6 months ago

This test is failing because the expected testCase is actually malformed, strictMode working accordingly. Not a regression. Test name: toJSONObjectToJSONArray

inline

I'm not sure if you'd like me to keep the tests like they currently are (even if malformed).

In the mean time I will isolate the tests as requested. I will correct and externalize the test cases into JSON files which I believe will be more compliant since modern IDEs check for syntax errors for us in cases like this.

rikkarth commented 6 months ago

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.

All of the points you've highlighted were fixed. There are a few cases (e.g checkValid("1 2", String.class);) where I don't exactly know what I should do there, and will require some more feedback on this.

I removed the TODO's, comments, etc that you provided in strict-mode-unit-tests as I fixed each issue.

I left some of the "important" TODO's the one's you wrote "TBD later" or "will revert later" so you could easily navigate back to them later.

There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.

Fixed.

There are also several errors in JSONTokenerTest that suggest a regression.

Fixed.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

I merged them into this branch to keep a linear history of what was affected in this PR.

rikkarth commented 6 months ago

I added this TODO's because I strongly believe the test cases here are super bloated and could be hugely simplified.

I will make my case in a future PR, leaving this here for reference.

stleary commented 6 months ago

@rikkarth Nice work. The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained. Will review the code, and add some comments.

rikkarth commented 6 months ago

@rikkarth Nice work. The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained. Will review the code, and add some comments.

The test for this use-case passes in strictMode=false just not in strictMode=true which follows JSON rules.

The tokener is trying to understand if the value is enclosed within an array or object.

And then the tokener in strictMode sees this value is not a Number, null or boolean ... it's a string without quotes.

The question is: Why should this test pass in strictMode? The only answer I could get was - it shouldn't.

The return when strictMode=false.

checkValid("1 2", String.class); //strictMode false

output

1 2

rikkarth commented 6 months ago

Could the label for this PR be updated? It's no longer pending redesign but pending review.

Thank you.

stleary commented 6 months ago

Could the label for this PR be updated? It's no longer pending redesign but pending review.

I think the label still applies. The goal of issue #887 was just to catch trailing chars at the end of the JSON doc. There might be a simpler solution that hasn't been found yet. For example, adding properties to the tokener does not seem like the right direction, or the way that JSONArray() was rewritten.

rikkarth commented 6 months ago

Fair enough.

Did you have time to read my previous comment?

It is the last point I have to complete the points you've defined previously.

I can make this last test pass, I'm just wondering why should this test pass in strict mode?

Do you have an opinion on it?

stleary commented 6 months ago

Did you have time to read my previous comment? It is the last point I have to complete the points you've defined previously. I can make this last test pass, I'm just wondering why should this test pass in strict mode? Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

rikkarth commented 6 months ago

Did you have time to read my previous comment? It is the last point I have to complete the points you've defined previously. I can make this last test pass, I'm just wondering why should this test pass in strict mode? Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

Yes, the error is exactly that. Quote-delimited strings, which is coherent with the design of strictMode.

If you change the test and put quotes around the string, while strictMode is true, then it passes the test.

Please check screenshots bellow.

stleary commented 6 months ago

What problem does this code solve? Complete the implementation of strict mode for JSONArray

Does the code still compile with Java6? Yes

Risks Low

Changes to the API? No

Will this require a new release? No

Should the documentation be updated? No

Does it break the unit tests? No, new unit tests were added

Was any code refactored in this commit? No

Review status APPROVED

Starting 3-day comment window

rikkarth commented 6 months ago

Cleaned up strictMode true flag as instructed.

stleary / JSON-java