Open rescribet opened 5 years ago
I'd also note Nodejs guarantees that callbacks will be called exactly once, but as far as I can tell, there's no equivalent finalization callback in our spec; the only option is to subscribe to every event.
I think some Promise interface would be beneficial for this.
On Jan 28, 2019 09:55, "Thom van Kalkeren" notifications@github.com wrote:
Is there any particular reason the current streams API is based on the event-based nodejs model, rather than the promise-based whatwg spec https://streams.spec.whatwg.org/?
The latter has already been deployed in most browsers, and allows any type https://streams.spec.whatwg.org/#model, so passing Quad instances as chunks would seem to be compliant, allowing full spec deference making this spec a lot simpler and cross-compatible.
The guys over at nodejs also seem to have started on implementing the whatwg spec https://github.com/nodejs/whatwg-stream, so switching might prevent premature obsolescence.
β You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdfjs/representation-task-force/issues/149, or mute the thread https://github.com/notifications/unsubscribe-auth/AAatDeGj9Gay0hMCVjGZ01XL4XzjASedks5vHyt9gaJpZM4aWVvo .
Is there any particular reason the current streams API is based on the event-based nodejs model
It's not, it's based on the subset of it that is fast π e.g., no backpressure handling etc.
rather than the promise-based whatwg spec?
It is terribly slow still for the moment.
Source: had to roll my own iterator library for an RDF query engine that uses Terms internally, because Node.js streams (one magnitude) and promise-based streams (two magnitudes) are slower.
e.g., no backpressure handling etc.
C-code without bounds checking should be significantly faster as well, but there are reasons why the features are there of course. One or two orders is a pretty large hit anyway, but I fear the features left out will find their way back into the spec over time (e.g. @awwright calling guarantees)
promise-based streams (two magnitudes) are slower.
The streams interface is relatively new and will probably gain performance improvements over time, especially when native support has been written into the engine (rather than building something custom). But it's already fast enough for the guys over at facebook to use it to stream video
Any promise based solution would have the overhead of instantiating, scheduling, and calling those promises, doing that for every statement separately would severely degrade performance. The current rdfjs spec emits one quad at-a-time, was the Promise-based implementation you tested also doing that?
I fear the features left out will find their way back into the spec over time
The low-level spec provides everything a higher-level spec needs to build in such features as back-pressure handling and promises.
Any promise based solution would have the overhead of instantiating, scheduling, and calling those promises
Exactly.
doing that for every statement separately would severely degrade performance.
But that's the way you do (async) streams.
Besides the Quad streams our RDFJS packages will produce, they need to be feed by streams from other packages and need to write via streams to other packages. If you look at available, used and maintained packages, Node.js streams are the current standard. They are also available as separate package. In the latest version they can be used with Symbol.asyncIterator. As async loops are a JS language feature and not just a spec on top of JS, I think we are on the save side. This was also already discussed in #129.
We also discussed it in one of the calls and we decided to use the subset of Node.js streams. As this topic was already closed some time ago, I will close this issue. If there are new strong arguments for WHATWG streams, we can open the issue anytime.
Transfered to stream-spec
repo of @rdfjs and reopened to clarify few things.
I tried adding parser-n3
based on n3.js
to some @solid related project and webpack bundle includes ~70k from readable-stream
repo which adds quite a lot to recommended performance budgets: https://medium.com/@addyosmani/the-cost-of-javascript-in-2018-7d8950fbb5d4
Source: had to roll my own iterator library for an RDF query engine that uses Terms internally, because Node.js streams (one magnitude) and promise-based streams (two magnitudes) are slower.
@RubenVerborgh do you refer to https://github.com/RubenVerborgh/AsyncIterator ? N3.js doesn't seem to use it. I see a lot of packages from @comunica, node-quadstore
and some other modules in https://www.npmjs.com/browse/depended/asynciterator
I see that @mcollina besides readable-stream
also works on https://github.com/nodejs/whatwg-stream/ maybe he would have some input on defining RDF/JS Streams interfaces in a way that they will align with a future direction of node and whatwg streams.
Nope, N3.js should work out of the box without any stream dependencies, depending on what exactly you use.
I think the best way is to add support for async iterators - when they are available. Both streams implementation offer them, they are fast, and they are more ergonomic and intuitive. They do the right think with error handling, which is always tricky.
Thanks @mcollina, but It's more nuanced than that:
ES6 AsyncIterator is still at least a magnitude slower. Matters if you commonly have millions of elements.
We only use a subset of the Stream interface; that subset is covered by simpler and more performant libs like https://github.com/RubenVerborgh/AsyncIterator
We are keeping an eye on ES6 AsyncIterator performance (I wish we had live tests in a browser, @joachimvh), but it's not looking good. Logical, because the protocol does more than the subset of stream we use.
Thanks @mcollina and my bad with splitting conversation between here and twitter: https://twitter.com/elfpavlik/status/1104500178432806912
We had some prior conversation where more than one person expressed :+1: for ES Asynciterator https://github.com/rdfjs/dataset-spec/issues/42#issuecomment-456155536
^ On high-level. Do this on low-level, and be prepared to handle streams one magnitude slower π
Can we find a way to provide both:
We only use a subset of the Stream interface; that subset is covered by simpler and more performant libs like https://github.com/RubenVerborgh/AsyncIterator
readable-stream
to the browser? Not sure if we could use dependency injection where application would pass to the parser (or other lib) one of
asynciterator
module Last two AFAIK have common interface but not the WHATWG ReadableStream.
I currently donβt see a way. The different interfaces are key to performance; change them and you change performance.
Agreed. You can easily add async interator support to your module and achieve better usability. It also does not remove your fast API when there is a need to.
Sounds like:
asynciterator
module and send extra ~55k to the browserread()
which currently this spec requires on Streams but also has priority to use Symbol.asyncIterator
over speed, would use readable-stream
module and send extra ~70k to the browser Symbol.asyncIterator
provided by WHATWG ReadableStream, out of luck since this spec requires read()
?You can provide an implementation of Symbol.asyncIterator without using any stream library.
I probably need to reformulate the problem. How we could implement those two parsers
So that one could use them in a web browser, parse response.body
(a WHATWG ReadableStream
), and add quads to the store. Currently both parsing and adding to store relies on Sink#import
interface which expects a Stream with node style read()
method on it. Besides parsers also quadstore
module also implements Sink#import
: https://github.com/beautifulinteractions/node-quadstore#rdfstoreprototypeimport
Would all three examples above and other similar modules, provide alternative interface which would expect ES AsyncIterator
and return one? Application could choose which interface to import and ES modules could allow granular fetching or tree shaking.
Still, currently @rdfjs modules seem to directly import node stream
eg. https://github.com/rubensworks/jsonld-streaming-parser.js/blob/master/lib/JsonLdParser.ts#L5
BTW it also seems that even if application would want to use fast asynciterator
module, libraries above will bundle in readable-stream
anyways since it gets imported there.
We also have this issue, to distribute those libraries as ES modules and not depend on Common JS and bundling: https://github.com/rdfjs/parser-n3/issues/4
Which also might have benefits in cases where application uses dynamic import()
to lazy load functionality.
- if one has processing speed as priority one would use
asynciterator
module and send extra ~55k to the browser
4k gzipped
- if one wants to have
read()
which currently this spec requires on Streams but also has priority to useSymbol.asyncIterator
over speed, would usereadable-stream
module and send extra ~70k to the browser
asynciterator
also has read()
.
- if one has priority to minimize code send to the browser and wants to use
Symbol.asyncIterator
provided by WHATWG ReadableStream, out of luck since this spec requiresread()
?
yes
Would all three examples above and other similar modules, provide alternative interface which would expect ES
AsyncIterator
and return one?
We probably want an opt-in spec for ES6 AsyncIterator. Although it wouldn't be much of a spec; it's basically just ES6 AsyncIterator. We might want to add that to stream-spec itself (but might be good to have it orthogonal).
Is there any particular reason the current streams API is based on the event-based nodejs model, rather than the promise-based whatwg spec?
The latter has already been deployed in most browsers, and allows any type, so passing
Quad
instances as chunks would seem to be compliant, allowing full spec deference making this spec a lot simpler and cross-compatible.The guys over at nodejs also seem to have started on implementing the whatwg spec, so switching might prevent premature obsolescence.