Open rescribet opened 6 years ago
Interesting. If it easy and speeds things up then I thinkj we woul dbe open to it so long as it is well documented
It would mean though that x.uri would no longer work for named nodes which would be a pain. We have the conflict between a clean logical design and a hack to allow parser to save time. A pity the difference can't be compiled out!
In a follow-up, the work needed for this change were quite large. Applying a singleton-instance pattern for all (BN/NN/Literal) terms in the store gave a large performance boost in terms of both CPU and memory.
It enabled strict equality checking which is very fast, while skipping toN3
which is used in sameTerm
, that uses string concatenation (in toN3; '<' + value + '>'
), so skipping that also reduced garbage collection.
In addition, there is at most one instance of NamedNode/BlankNode/Literal for each term, resulting in a memory reduction.
Can you share these changes, perhaps in the form of a PR? It would also be great if you could share your benchmarks so we can acquire those skills.
Yeah sure, I'll be glad to do that. Might be a few weeks though
In using this library in the browser building our new front-end, I've come across some performance problems regarding to the OO-based types which are used in this library. I'm not sure whether to post this here, or in the task force repo, but since our implementation uses rdflib.js directly I'll opt for this issue tracker.
In some scenarios, when the data source is to be trusted, incoming data can't just be loaded into the store directly, but it needs to convert all plain objects to instances of
Node
, which can mean some significant performance penalties for slower devices (e.g. mobile browsers).I ran into this problem when using web workers and when restoring a graph from indexedDB. The incoming objects already have memory allocation, but the following conversion into Node objects resulted in a lot of garbage collection.
So I was wondering whether there'd be enthusiasm to approach the inner implementations in a more functional (maybe even immutable) style implementation. If my expectations are correct, it would enable the store to directly reuse incoming objects from response bodies, re-use nodes with the same values, and allow IRI references to be single instance (saving memory). In addition, computational heavy operations (e.g. calculating the hashString) could be handed off to a worker thread (or even the server) and stored in-value (if they are immutable).
From my dives into the code of this library, the changes seem rather straightforward, since the nodes already have a discriminating property which indicates their type (
termType
). But I'm probably missing some important insights. Changing functions likeNode.equals(other: Node)
to(Node.equals(one, other))
where the arguments are of the form{ termType: x, value: y}
would at least allow objects from web workers, indexedDB, and fetch to be reused.