ohler55 / ojg

Optimized JSON for Go
MIT License
857 stars 49 forks source link

The json hierarchy if too deep resulting in no matching #159

Closed mmungdong closed 8 months ago

mmungdong commented 9 months ago

First of all, thank you for providing this tool to help me a lot in the development process.

I have encountered some confusing areas in the process and need your help, case in point is as follows:

func main() {
    obj, _ := oj.ParseString(`{
        "a":[
            {
        "b": {
            "c": {
                "d": [
                    {"e": "e1"},
                    {"e": "e2"},
                    {"e": "e3"},
                    {"e": "e4"}
                ]
             }
           }
         }
         ]
         }`)
    a1, _ := jp.ParseString("$.a[0].b.c.d[?(@.e == 'e1')]") // ok
    a2, _ := jp.ParseString("$.a[0].b.c.d[?(@.e == 'e2')]") // ok
    a3, _ := jp.ParseString("$.a[0].b.c.d[?(@.e == 'e3')]") // ok
    a4, _ := jp.ParseString("$.a[0].b.c.d[?(@.e == 'e4')]") // ok

    fmt.Println(a1.Get(obj))
    fmt.Println(a2.Get(obj))
    fmt.Println(a3.Get(obj))
    fmt.Println(a4.Get(obj))

    // output:
    // [map[e:e1]]
    // [map[e:e2]]
    // [map[e:e3]]
    // [map[e:e4]]

    fmt.Println("---------------")

    b1, _ := jp.ParseString("a[?(@.b.c.d[*].e == 'e1')]") // ok
    b2, _ := jp.ParseString("a[?(@.b.c.d[*].e == 'e2')]") // fail
    b3, _ := jp.ParseString("a[?(@.b.c.d[*].e == 'e3')]") // fail
    b4, _ := jp.ParseString("a[?(@.b.c.d[*].e == 'e4')]") // fail

    fmt.Println(b1.Get(obj))
    fmt.Println(b2.Get(obj))
    fmt.Println(b3.Get(obj))
    fmt.Println(b4.Get(obj))

    // output:
    // [map[b:map[c:map[d:[map[e:e1] map[e:e2] map[e:e3] map[e:e4]]]]]]
    // []
    // []
    // []
}

In the above demo, I think b2, b3,b4 and b1 should get the same result.

THX

ohler55 commented 9 months ago

hmm, I would have expected the second set to work as well although I doubt it is due to how deeply nested the data is. I'm reworking the JSONPath parser right now, well for the next couple of days after work, but will get to this one right after that.

mmungdong commented 8 months ago

Ok, I'm looking forward to how this bug is fixed

ohler55 commented 8 months ago

I've start the fix or rather have identified the issue. It has nothing to do with nesting depth. The underlying issue is the filter not checking each path in the filter but instead only the first one. Now I just need to come up with the most elegant fix.

ohler55 commented 8 months ago

There is a dilemma to deal with. A filter returns either true or false yet with the wildcard (*) selector multiple values (e1, e2, e3, e4) are selected. Of the 4 3 will not match and the result false. One will match and return true for the filter. So, should the filter be a logical OR of all the selections or a logical AND. The JSONPath spec does not clarify or specify which is the correct behavior.

Anyway, that is the dilemma. I am inclines to assume a logical OR and document that behavior. Do you have an opinion?

thadguidry commented 8 months ago

I'd expect functionally that it would operate almost in a reduce fashion from map/reduce algorithm. So wouldn't that mean logical AND? Because of the notion of reduce being there is a forEach() and summation? And summation semantically at least being a logical AND? Feels like that to me anyways.

If we break it down closely... d[*].e == 'e1' <-- doesn't that really mean "for each e element in d ... filter and reduce to only those having a value of e1 ?

I'm wrong, it means actually... for each d , get all child e elements... that match a value of e1.

ohler55 commented 8 months ago

Map reduce is more a manner of processing of course but I think you answered that the summation step is a logical AND so in your example about the expected result would be an empty set in all 4 cases since at least one (or 3 out of 4) return false for a match.

mmungdong commented 8 months ago

My understanding is that as long as the match to e1, e2, e3 or e4 any one of them should return true, rather than only e1 in the correct match to return true, rather than the other did not match to return false, this will be in the use of the user to feel strange, why my e1 can be matched but e2 or e3 or e4 can not be matched to the filtering will allow people to have a kind of illusion is not my data does not have this e2 or e3 or e4, so I prefer the logic or in use.

thadguidry commented 8 months ago

Fully this JSONPath expression a[?(@.b.c.d[*].e == 'e1')]

thus means

for each a array element,

ohler55 commented 8 months ago

I think you are arguing for what I suggested. A logical OR across the results. Let me given an example to maybe clarify.

(@.b.c.d[*].e == 'e1') becomes 4 paths: (@.b.c.d[0].e == 'e1') => true (@.b.c.d[1].e == 'e1') => false (@.b.c.d[2].e == 'e1') => false (@.b.c.d[3].e == 'e1') => false

Summarizing the results of the filter with a logical OR is true so a[0] would be returned. If a logical AND summary the filter would be false and the empy set would be returned.

ohler55 commented 8 months ago

Please give the "get-lost" branch. That adds support for wildcards and other fragments in filter expressions that could match more than one value.

ohler55 commented 8 months ago

Released v1.21.2 with the fix.

mmungdong commented 8 months ago

Ok, I'll close this issue then, thanks again for the tools!