Closed mitar closed 4 weeks ago
Have you looked at the Tokenizer? It requires a bit more work to deal with the callbacks but it does allow processing without loading the whole JSON into memory.
You mean this Tokenize? But I could not find any existing TokenHandler
implementation, especially jp
package does not seem to provide a TokenHandler
which could be used in a streaming manner? So I would like to offer users of my tool that they write a query/path of what to extract in standard JSONPath, but it seems I would then have to implement conversion from JSONPath to TokenHandler
myself to be able to do this in streaming manner?
The TokenHandler is an interface meant to be implemented by the caller. An example is in the tokenizer_test.go file.
The request as you described it wants to look ar effectively arbitrary elements of a JSON document. OjG provides a means to look at all the elements of a JSON document using the Tokenizer. By creating their own TokenHandler the caller can decide which element to keep and which to discard. A path can be build using the Key
method of the TokenHandler.
BTW, I do like your idea. If you were not planning on making the handler I might implement it myself. Happy to let you make the offering though.
Yea, I think I get the idea how this could work, but sadly I do not have time currently that I would implement this myself. So feel free to go for it. If I will be able to circle back to this, I will notify you here.
It rained this morning so I implemented the Match
functions along with a jp.MatchHandler
. There are tests in the oj, sen, and jp directories if you want to see how they are used. Please give them a try and let me know what you think.
All in the "path-handler" branch.
I've also updated the oj command (cmd/oj/oj) to handle the path handler with the "dig" option. I'll release tomorrow unless you have some feedback before then.
v1.24.0 released with the additional feature.
I never responded here, but this really looks awesome! Thanks!
I am building a tool which would extract data from a potentially large JSON. If data is ndjson, then it is easy to read it line by line and extract data from each separate object. But if data is in a large JSON array, or even worse, a large JSON array nested under one field in JSON object (example of such a file is
brandedDownload.json
), then it seems I have to first load into the memory the whole file before I can extract that data out using JSONPath provided by this package. It would be nice if I could lazily construct the path and then iterate over the nested array, loading into the memory just the amount which is needed to iterate to the next array element.