Closed Tpt closed 1 year ago
The biggest problem with this one is the fact that there's no license on the gist, which means that we can't just copy-paste it and package it ourselves. But if you think this one is promising, we could ask the author if he agrees with us reusing it under a certain license.
Make stream-json compatible with web browsers.
This also sounds like a good solution if that SAX variant of jsonparse is impossible. We could discuss with the author if he's open to a PR to make it web-compatible.
The biggest problem with this one is the fact that there's no license on the gist, which means that we can't just copy-paste it and package it ourselves. But if you think this one is promising, we could ask the author if he agrees with us reusing it under a certain license.
That's a good point. An even easier approach: I use as base jsonparse
and I simplify it to avoid the memory problem. It should not be too hard.
If it's fine for you, I move in this direction.
I use as base jsonparse and I simplify it to avoid the memory problem. It should not be too hard.
Not sure that will be very easy to do. But if you think that's possible, this would indeed be the best way forward.
If it's fine for you, I move in this direction.
👍
I have started the fork here: https://github.com/Tpt/json-event-parser.js
I have modernized the codebase by converting it to TypeScript. Next step: avoid keeping all data into memory.
I have now a working version without in-memory buffering. I ran again the JSON performance benchmark (code here):
Library | Execution time | Memory |
---|---|---|
stream-json |
4.770s | 89MB |
jsonparse |
3.381s | 410MB |
SAX variant of jsonparse |
1.973s | 100MB |
clarinet |
2.425s | 94MB |
json-event-parser (new) |
2.152s | 94MB |
So json-event-parser
is slightly slower than the fastest SAX variant of jsonparse
(0.2s slower) and uses slightly more memory than the least memory hungry stream-json
(5MB).
@rubensworks How do you want us to proceed? I might first do a WIP PR on jsonld-streaming-parser
using json-event-parser
instead of jsonparse
.
@Tpt Going forward with the new json-event-parser
lib sounds perfect to me! Great that you've managed to reduce exec time and memory usage there.
Is it already capable of true streaming (which jsonparse
could not)? E.g., if you attach a data listener for a large file, and remove it again after 100 data events, the file is not processed anymore?
Would you agree with moving the json-event-parser
repo (not urgent) under the comunica namespace? Just so that we're certain that we can still maintain it in case you would not be available anymore in the future for whatever reason. You can still keep full push access and be listed as the author of course.
Is it already capable of true streaming (which
jsonparse
could not)? E.g., if you attach a data listener for a large file, and remove it again after 100 data events, the file is not processed anymore?
Yes, with the restriction that the current json-event-parser
implementation can not be stopped after any event. It will keep reading the chunks that were already given to it up to their end. But it's not a big deal if the written chunks are small.
I am considering making the JsonEventParser
implement the Transform
stream interface to make its API cleaner and avoid this limitation. However, I am not sure how to wrap a Transform
inside of an other Transform
so I am not sure if it would be possible to keep the JsonLdParser
API unchanged.
Would you agree with moving the json-event-parser repo (not urgent) under the comunica namespace?
With pleasure. It would be much better this way.
It will keep reading the chunks that were already given to it up to their end. But it's not a big deal if the written chunks are small.
That sounds ok.
However, I am not sure how to wrap a Transform inside of an other Transform so I am not sure if it would be possible to keep the JsonLdParser API unchanged.
This would be a major version bump anyways, so it's ok if we need to do other breaking changes.
However, I am not sure how to wrap a Transform inside of an other Transform so I am not sure if it would be possible to keep the JsonLdParser API unchanged. This would be a major version bump anyways, so it's ok if we need to do other breaking changes.
This would lead API user to either use JsonLdParser.import(myStream)
or to have to write myStream.pipe(new JsonEventParser()).pipe(new JsonLdParser())
. I am not sure it is very convenient. I am looking for a clean way to hide JsonEventParser
inside of JsonLdParser
while keeping backpressure but it does not sound so easy. If you know how to do it, it would be much welcomed.
Edit: I have found stream.compose
which does exactly what I want.
We went for another temp workaround in 448c8fe47fcd8d7f2f9124edbf2631913eb9b460.
But the new json-event-parser
will still be needed for #76.
This is a subtask of #76
The
jsonparse
library keeps the full JSON content into memory to fill the"value"
attribute. This leads to big memery consumption as reported in #65.Some alternative are described in rubensworks/test-performance-json-parse-stream.
stream-json
but with less adoption and documentation seems to have the most adoption and the best documentation. However compilation with WebPack 5 fails even if it claims compatibility with web browers.Some other ones:
jsonparse
.Possible way forward.
If we want to squeeze as much performance as possible, the SAX variant of jsonparse seems the way to go but it means maintaining yet an other library. The two other options requires either forking or managing to get a change merged and deployed and that seems not trivial.