Open rubensworks opened 3 years ago
I just had a quick look at it. I see two places where there might be a significant problems with backpressure:
JsonLdParser.import
does not care about backpressure at all. A quick way to add backpressure there would be to use the pipe
method on the input stream when available.If no context is provided by the caller and not in streaming mode, the JSON-LD parser buffers all data before starting parsing only when the JSON is fully read. It will push all data when the last piece of JSON is read. I am not sure how much ignoring back pressure is bad knowing we already buffered the complete JSON into memory.
Indeed, unless the streaming profile is enabled, the full JSON-LD document is stored in memory before processing can start.
This is because without the streaming profile (which mandates that certain JSON keys come in specific orders), the following situation could occur:
{
... very long JSON-LD document
"@context": "..."
}
Since the context in the example above comes at the end, the meaning of all preceding entries could be different. So that is why buffering must take place.
With the streaming profile on the other hand, the @context
MUST occur first (or after @type
).
Just pinging @wouterbeek and @LaurensRietveld on this as well, as this will have an impact on the bounty's resolution. I.e., if the streaming profile is not explicitly enabled, full buffering will take place, and memory issues can still arise.
Hi @rubensworks and @Tpt , having to explicitly enable the streaming profile in order to properly (without buffering fully) stream through the jsonld is fine from our end
There may be some places where we are not adhering to Node's backpressure conventions with regards to streams.
Concretely, we seem to be creating new streams (such as
Readable
,PassThrough
andTransform
), andpush
-ing into them (via a'data'
handler on another stream). Node handles backpressuring via the return value ofpush
, which we are ignoring in this manner.A better solution would be to simply
pipe
instead of callingpush
on each data element.Related to https://github.com/rubensworks/rdf-parse.js/commit/269c757935c54b388e1bde076dc29c2afc2e8e7b
Related to #66