halaxa / json-machine

Efficient, easy-to-use, and fast PHP JSON stream parser
Apache License 2.0
1.1k stars 65 forks source link

Let JSON pointer iterate through values in nested arrays #47

Closed cerbero90 closed 3 years ago

cerbero90 commented 3 years ago

Hi @halaxa and thanks for this brilliant package :)

Based on RFC 6901, the character - can reference a JSON array and applications using a JSON pointer can specify how it should handle such character. In particular:

If the currently referenced value is a JSON array, the reference token MUST contain either:

  • characters comprised of digits [...], or

  • exactly the single character "-", making the new referenced value the (nonexistent) member after the last array element.

and

Note that the use of the "-" character to index an array will always result in such an error condition because by definition it refers to a nonexistent array element. Thus, applications of JSON Pointer need to specify how that character is to be handled, if it is to be useful.

Any error condition for which a specific action is not defined by the JSON Pointer application results in termination of evaluation.

This PR aims to take advantage of this spec to let the JSON pointer iterate through values in nested arrays.

For example, given the following JSON:

{
    "data": [
        {
            "name": "Team 1",
            "users": [
                {
                    "id": 1
                },
                {
                    "id": 2
                }
            ]
        },
        {
            "name": "Team 2",
            "users": [
                {
                    "id": 3
                }
            ]
        }
    ]
}

the JSON pointer /data/-/users/-/id will iterate through all user IDs.

This PR also tests the feature and updates the README.

halaxa commented 3 years ago

Thank you very much for being interested in this project and for your work.

Firstly, why do you need this? Are there too many items in the users key, so that the object which contains it does not fit in memory? If so, have you tried this https://github.com/halaxa/json-machine#that-didnt-help?

I'm not sure that the - symbol is supposed to be used like this. If I get it right, the spec says it means a next nonexistent element in an array. This usage would change the meaning, wouldn't it? Another thing is there are function calls in a loop. Have you looked into the impact on performance? If something like this should be included, I would probably go the pattern-matching way using built-in regular expressions. What do you think?

cerbero90 commented 3 years ago

Thanks for your reply @halaxa :)

Firstly, why do you need this?

I'm building an open-source Laravel package that yields heavy JSON in lazy collections thanks to your JSON machine and I would like to optionally let developers define the subtree they are interested in.

In Laravel it is common to use dot-notation to access arrays and nested arrays, e.g. data.*.users.*.id. So I'm bridging dot-notation to JSON pointer when calling your JSON machine. However nested JSON pointers are not allowed at the moment, so this PR tries to propose an implementation for it.

I'm not sure that the - symbol is supposed to be used like this. If I get it right, the spec says it means a next nonexistent element in an array.

Correct, and it is considered an error condition by default, unless the application implementing the JSON pointer finds a way to make it useful:

Note that the use of the "-" character to index an array will always result in such an error condition because by definition it refers to a nonexistent array element. Thus, applications of JSON Pointer need to specify how that character is to be handled, if it is to be useful.

If something like this should be included, I would probably go the pattern-matching way using built-in regular expressions. What do you think?

Totally agree with you, regular expressions would perform better :)

cerbero90 commented 3 years ago

Had a second look at the code and realized that the implementation is actually much simpler than expected. We can leverage the existing variables without adding complexity or function calls within loops.

The PR is updated, let me know what you think.

halaxa commented 3 years ago

I think the simplicity of this solution is brilliant :)

Just let me think about that one more time to decide if this is the way to go.

halaxa commented 3 years ago

Can you please check the english in the Readme edit?

cerbero90 commented 3 years ago

Sure, readme updated with minor changes 👍