rdf-ext-archive / discussions

This repo is for discussions all over the rdf-ext project
3 stars 2 forks source link

Parser API #1

Closed bergos closed 8 years ago

bergos commented 8 years ago

The JSON-LD parser module, based on AbstractParser, implements a new API. If we can agree on the new API I will create modules for all other parser, currently bundled with RDF-Ext.

Promise process (any toparse, optional ProcessorCallback callback, optional DOMString base, optional TripleFilter filter, optional SuccessCallback done)

.process returns now a Promise object. For asynchronous parsers the boolean value was useless anyway. Both, done and Promise.resolve, are called. done can be null. toparse can be a Stream. The parser detects if the toparse value is a Stream object. The rest works like defined in the RDF-Interfaces/RDF-Ext spec.

Promise parse (any toparse, optional ParserCallback? callback, optional DOMString base, optional TripleFilter filter, optional Graph graph)

Also .parse returns now a Promise object like .process. Both, callback and Promise.resolve, are called. callback can be null. toparse can be a Stream. The parser detects if the toparse value is a Stream object. The rest works like defined in the RDF-Interfaces/RDF-Ext spec.

ReadableStream stream (Stream inputStream, optional DOMString base, optional TripleFilter filter)

The .stream method is new. The data, end and error event is used. N3.js uses also a prefix event. This is currently not implemented. Do we need this? Everything else works as usual.

elf-pavlik commented 8 years ago

I know that @mcollina prefers streams and callbacks while @nathan7 has tons of experience with wrapping streams and callbacks with promises. Could you guys imagine helping us with reviewing this proposal? :bow:

mcollina commented 8 years ago

@elf-pavlik It's not exactly clear what the API of AbstractParser is. Can you please fill in the READMEs/docs of those, so I can understand better? Moreover, what is the question?

Also, this is wrong: https://github.com/rdf-ext/rdf-parser-abstract/blob/master/index.js#L55-L81, the 'data' event should not be used, and definitely not accumulated in that way. Use concat-stream, or bl.

bergos commented 8 years ago

@mcollina Thanks for your feedback. I changed already the code to use concat-stream, like you proposed.

The described API will be used in more parser implementations, so we want to be sure it's designed the right way. I tried to offer all common ways to handle async calls and streaming data. The Readme contains now also some more details. What do you think about it?

The method implementations in AbstractParser are just fallbacks for parsers that don't support streaming. So just ignore that it's not nice to concat all the data of the stream...

mcollina commented 8 years ago

I would move the base, filter, graph properties to the constructor. These are not likely to change in-between parsing, and they might be changed via properties anyway.

Regarding streaming, the big issue I am seeing is that this is not a "streaming parser", meaning you need to wait reading the whole document before importing it. This is an important use case you should support for large datasets.

See @RubenVerborgh https://github.com/RubenVerborgh/N3.js for an example of a streaming parser.

bergos commented 8 years ago

I think base will change nearly every time the parser is called. The same graph object will only be used to merge multiple sources into one graph. So I expect that property will also change most of the time. I expect filter will be used only in cases with a lot of triples, but also in that case the filter could change between the calls to the parse methods.

I know that it's not a real streaming parser at the moment, but in the future parser implementations should replace the method of AbstractParser with code that does real streaming.

The combination of callbacks and Promises is OK from your point of view?

bergos commented 8 years ago

Must be defined in the RDFJS Representation Task Force spec.