wanglingsong / JsonSurfer

A streaming JsonPath processor in Java
MIT License
294 stars 55 forks source link

Enhancement: Ability to listen to `startObject`/`endObject` tokens #111

Open itsjustbrian opened 8 months ago

itsjustbrian commented 8 months ago

I'll start with my use case since I may be missing a simpler way to do it.

I'm parsing a non-blocking stream of a newline-delimited JSON file where each object has a massive array. I have a path listener that emits on every object in the array and saves it to a db. However, I also need to include some values from the top-level object in each of the array objects. I made use of context here and bound a listener for each top-level key I needed and saved it on the context. Then, once an object has finished being parsed, I take the values from the context and update the previously saved objects in the db.

The problem is detecting when the current object ends so I can clear the context in preparation for the next object. Right now I'm relying on key order: if the listener fires for what I know is the last key, I know I can clear the context. But this is flimsy/hacky since I have no control over the key order. I've also tried listening for the path "$" but this appears to load each whole object into memory which is not viable for the size of objects I'm dealing with.

It would be nice to be able to hook into context callbacks like startObject/endObject: https://github.com/wanglingsong/JsonSurfer/blob/874d47cdccc8de36f87d00df52249189758548f6/jsurfer-core/src/main/java/org/jsfr/json/SurfingContext.java#L287 by binding a new type of listener that would fire for these tokens, along with the current depth (though I suppose this could be derived from currentPosition)

Maybe something like:

        surfer.configBuilder()
                .bindTokenListener(new TokenListener() {
                    @Override
                    public void onStartObject(ParsingContext context) {
                        System.out.println(context.depth); // 0...n
                    }

                    @Override
                    public void onEndObject(ParsingContext context) {
                        System.out.println(context.depth); // 0...n
                    }
                })

This would also be applicable to any token handled by SurfingContext.

Awesome library by the way!

wanglingsong commented 8 months ago

Maybe you can check getCurrentArrayIndex in the ParsingContext to detect whether it starts parsing a new array element

itsjustbrian commented 8 months ago

@wanglingsong That would work if the file I was dealing with was a top-level array e.g.

[
  {...},
  {...},
  {...}
]

But I'm dealing with newline-delimited objects e.g.

{...}
{...}
{...}

So getCurrentArrayIndex is always -1