Functional-style datamodel & performance

rescribet commented 6 years ago

In using this library in the browser building our new front-end, I've come across some performance problems regarding to the OO-based types which are used in this library. I'm not sure whether to post this here, or in the task force repo, but since our implementation uses rdflib.js directly I'll opt for this issue tracker.

In some scenarios, when the data source is to be trusted, incoming data can't just be loaded into the store directly, but it needs to convert all plain objects to instances of Node, which can mean some significant performance penalties for slower devices (e.g. mobile browsers).

I ran into this problem when using web workers and when restoring a graph from indexedDB. The incoming objects already have memory allocation, but the following conversion into Node objects resulted in a lot of garbage collection.

So I was wondering whether there'd be enthusiasm to approach the inner implementations in a more functional (maybe even immutable) style implementation. If my expectations are correct, it would enable the store to directly reuse incoming objects from response bodies, re-use nodes with the same values, and allow IRI references to be single instance (saving memory). In addition, computational heavy operations (e.g. calculating the hashString) could be handed off to a worker thread (or even the server) and stored in-value (if they are immutable).

From my dives into the code of this library, the changes seem rather straightforward, since the nodes already have a discriminating property which indicates their type (termType). But I'm probably missing some important insights. Changing functions like Node.equals(other: Node) to (Node.equals(one, other)) where the arguments are of the form { termType: x, value: y} would at least allow objects from web workers, indexedDB, and fetch to be reused.

timbl commented 6 years ago

Interesting. If it easy and speeds things up then I thinkj we woul dbe open to it so long as it is well documented

timbl commented 6 years ago

It would mean though that x.uri would no longer work for named nodes which would be a pain. We have the conflict between a clean logical design and a hack to allow parser to save time. A pity the difference can't be compiled out!

rescribet commented 6 years ago

In a follow-up, the work needed for this change were quite large. Applying a singleton-instance pattern for all (BN/NN/Literal) terms in the store gave a large performance boost in terms of both CPU and memory.

It enabled strict equality checking which is very fast, while skipping toN3 which is used in sameTerm, that uses string concatenation (in toN3; '<' + value + '>'), so skipping that also reduced garbage collection. In addition, there is at most one instance of NamedNode/BlankNode/Literal for each term, resulting in a memory reduction.

ericprud commented 5 years ago

Can you share these changes, perhaps in the form of a PR? It would also be great if you could share your benchmarks so we can acquire those skills.

rescribet commented 5 years ago

Yeah sure, I'll be glad to do that. Might be a few weeks though

linkeddata / rdflib.js

Functional-style datamodel & performance #219