rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
715 stars 132 forks source link

Merging rdf-dataset-indexed back into N3.js? #163

Open vhf opened 5 years ago

vhf commented 5 years ago

Let's discuss if and how we can merge https://github.com/rdf-ext/rdf-dataset-indexed back into N3.js following @RubenVerborgh suggestion on twitter!

I'll explain why this was necessary for us and what the changes are.

Why: N3Store is very powerful thanks to the indices and to a bunch of (micro-)optimizations. To the best of my knowledge N3Store was the best existing code to get a fast dataset implementation. We needed to work on datasets with thousands of quads client-side, naive dataset implementations were be much too slow. We wanted a drop-in replacement for https://github.com/rdf-ext/rdf-dataset-simple in https://github.com/rdf-ext/rdf-ext that we could later change to adhere to the upcoming dataset spec.

What we didn't need in this dataset package: n3 parsing and n3 writing.

What we didn't have in N3.js but wanted: keeping and retrieving the original quads intact. For perf and memory reasons N3Store doesn't store quads, it only consists of indices from which all inserted quads can be reconstructed. We were willing to take a small perf and memory hit to get the following:

https://github.com/rdf-ext/rdf-dataset-indexed/blob/57cc9ab09831c9620c3cb4d766a48d44b191b557/test/dataset.test.js#L89-L104

Other requirement: to be able to use a dataset client-side in stores based on the flux pattern, where modifying a stored object/value should only be done through actions, we removed the cached dataset size. This prevents a getter from changing the store content:

https://github.com/rdfjs/N3.js/blob/de9fd5e9eb2c4c3cfb39a66c03bdd714d8a7d077/lib/N3Store.js#L40-L54

With these changes (and some cosmetic ones such as different linting, different testing tool), implementing rdf-ext features was as simple as this file: https://github.com/rdf-ext/rdf-dataset-indexed/blob/master/dataset.js

bergos commented 5 years ago

Background why to keep Quads and Terms: The idea would be to attach additional properties about the origin of the Quad and Term. This could be line number and offset for formats like Turtle, N-Triples or DOM attribute/node for RDFa. Right now it's not supported by parsers, but it's on the whish list to have that optional feature. It would be very useful for debugging and especially allow beginners to better understand the data flow.

For the upcoming Dataset Spec I started already basic implementation. The idea of that code is to have a very simple Dataset implementation to check the ideas of the spec in actual code. The focus is simple, understandable, readable code and not performance. The Data Model and Store/Dataset code of N3.js would be a good candidate to have a high performance implementation of a combined Data Model and Dataset factory. Creating a separate package would be useful for those, who just want that part and not the complete N3.js parser and serializer package. Also it can be confusing or even hard to find if one is searching for Data Model or Dataset and ends up in a package named N3.js.

rubensworks commented 5 years ago

Creating a separate package would be useful for those, who just want that part and not the complete N3.js parser and serializer package.

Perhaps it would make sense to split up N3.js into separate package for parsers, serializers and the store?

jimsmart commented 5 years ago

Excuse me for asking, I know this isn't really the correct place (though arguably it might be the best place), but how can I get involved with / on a mailing list for info and discussions surrounding these new RDF API specs?

I'm interested in part because I contributed to N3.js's store, but also because I am working on an RDF library for Go, which I intend to open source sometime in the near(ish) future.

ktk commented 5 years ago

@jimsmart this is done in the RDFJS community group. See the github org, all is linked there