stleary / JSON-java

A reference implementation of a JSON package in Java.
http://stleary.github.io/JSON-java/index.html
Other
4.54k stars 2.56k forks source link

884: Strict Mode for JSONObject #903

Closed rikkarth closed 2 weeks ago

rikkarth commented 3 months ago

Issue description https://github.com/stleary/JSON-java/issues/884:

Strict mode now works for JSONArray (see #877), but not for JSONObject or for JSONArrays embedded in JSONObjects. For example, the following code parses without error:

        String str = "{\"a\": [b]}";
        JSONObject jsonObject = new JSONObject(str, new JSONParserConfiguration().withStrictMode(true));

Several additional changes should be included with the PR:

* The indentation in nextString(char, boolean) is incorrect.

* ~There is a compiler warning in JSONTokener that should be fixed:~ (fixed in [add javadoc for strictmode #886](https://github.com/stleary/JSON-java/pull/886))
Warning:  Javadoc Warnings
Warning:  /home/runner/work/JSON-java/JSON-java/src/main/java/org/json/JSONTokener.java:294: warning: no @param for strictMode
Warning:  public String nextString(char quote, boolean strictMode) throws JSONException {
Warning:  ^
Warning: [WARNING] 1 warning

Analysis and Implementation

Strict mode now works for JSONArray (see #877), but not for JSONObject or for JSONArrays embedded in JSONObjects

Fixed.

* The indentation in nextString(char, boolean) is incorrect.
rikkarth commented 3 months ago

Currently in unit-test phase.

If you wish to help me by providing test cases, I would appreciate it.

Ideally, examples where strictMode should fail but doesn't, or vice-versa (shouldn't fail but fails).

I will continue to add different test-cases and submit this PR for review on the 11th of August so it can be reviewed next week.

Currently Testing

Strict Mode: TRUE

Additionally this would be a good time to give a superficial review, just to help me understand if I'm going in the right direction.

Thank you,

cc @stleary

stleary commented 3 months ago

@rikkarth Sure, here is some feedback to help you get started. My concerns were noted in #894, where it is not very visible, so I am copying the relevant section here:

The purpose of strict mode is, I think, actually fairly simple: to enforce 
double quotes around strings and disallow invalid trailing chars at the end 
of the parsed document. Also, during parsing, we have to make sure 
that the JSONParserConfiguration object is passed or default initialized 
wherever it might be needed. If I missed anything else, please remind me.

For string checking, one would think that would be done in JSONTokener.nextString(), but it was not. 
Instead, a new method, parseUnquotedText() is added. This does not seem like the simplest and 
most direct implementation to me. For example, by doing the work in nextString(), 
we would not need JSONTokener.smallCharMemory, or the methods that manipulate it.

For trailing chars, we need to separate the top-level array or object from nested instances. 
Recall, we are only concerned with raw text parsing. Wouldn't the String and Tokener 
constructors be the best place to do this? Then we would not need JSONTokener.arrayLevel.

Regarding the tests, I still think that it's fair to limp along for now with manual strict mode tests, 
but as mentioned above, the missing JSONObject tests must be filled in.

In summary, some missed work in JSONObject, many missing JSONObject unit tests, 
and some concerns that the implementation could be much simpler and more direct.

Let's not worry about unit tests until we have an acceptable working implementation. Here are the requirements for the code:

You may find that after implementing the above, there are still regressions and/or outstanding issues. If so, let's address them one at a time until everything is resolved.

stleary commented 3 months ago

@rikkarth Hope all is going well. We need to make some progress on resolving the remaining issues. If you are stuck, please reach out, post your questions, and I will help. Or if you just need a few more days, let me know.

rikkarth commented 3 months ago

Basically I have a working version with smallCharMemory.

Removing it is the final piece, but I haven't been able to create an alternative solution (yet) for what smallCharMemory solves.

I will take another look at it, and come back with more feedback - today.

rikkarth commented 3 months ago

I would ask someone that can help me to run the current unit tests with this implementation and see if there is a use case not covered.

If all is good, I or someone that can support me on this task can help me remove the smallCharMemory part and adjust the implementation.

Unfortunately I haven't had the time to complete this part as every time I tried to solve this I got stuck on trying to come up with a new implementation.

stleary commented 3 months ago

Can you commit your latest code to this PR? I can take a look.

rikkarth commented 2 months ago

For string checking, one would think that would be done in JSONTokener.nextString(), but it was not. Instead, a new method, parseUnquotedText() is added. This does not seem like the simplest and most direct implementation to me. For example, by doing the work in nextString(), we would not need JSONTokener.smallCharMemory, or the methods that manipulate it.

parseUnquotedText() returns Object because it may return a Number, a Boolean or NULL.

private Object getValidNumberBooleanOrNullFromObject(Object value) {
    if (value instanceof Number || value instanceof Boolean || value.equals(JSONObject.NULL)) {
        return value;
    }

    throw this.syntaxError(String.format("Value '%s' is not surrounded by quotes", value));
}

I moved it inside nextString() even though I think parseUnquotedText() should be left out of nextString()

So now nextString() is responsible for all quotes validation, but it doesn't really return just the "nextString" but instead "nextObject" so maybe the method should be renamed.

This parseUnquotedText() already existed in previous implementations, only wasn't encapsulated under its own method.

Now it's more modular, you can either make it apart of nextString (which should be renamed to nextObject if it ends that way), or you can just do an if else statement and leave it outside of nextString, eitherways I believe the end result is sort of the same - at least for now.

image

rikkarth commented 2 months ago

At the moment this is how I see my TODO list.

I would really appreciate some help with this last part, unless there are other modifications on the previous points you'd like me to do.

Thank you for your patience so far.

stleary commented 2 months ago

@rikkarth Thanks for letting me know. Sounds like you have taken this feature most of the way, I will look into completing the last item.

stleary commented 2 weeks ago

Closed due to reverting incomplete strict mode so that other commits can be released.