Closed jacoscaz closed 2 years ago
Tagging Oxigraph's creator @Tpt just in case he's interested.
@jacoscaz Thank you for tagging me! Oxigraph JS is currently compiled to WASM making hard to use the disk for storage. I am currently considering rewriting it as a native NodeJS extension to allow easy and fast disk access.
About performance comparison here is a paper who compared Oxigraph with other JS SPARQL implementations: https://openreview.net/pdf?id=CXLmXMb2TJ (see section 4.3).
@Tpt thank you for that link! I am very curious as to the extent of the performance gap between Oxigraph and Quadstore in the following realms:
There's no way for Quadstore to match Oxigraph, of course, but the Node.js bindings make the latter a nice reference point for an apples-to-apples comparison (as opposed to comparing an embedded store to an external one), particularly given work ongoing on #115 .
About performance comparison here is a paper who compared Oxigraph with other JS SPARQL implementations: https://openreview.net/pdf?id=CXLmXMb2TJ (see section 4.3).
Would be interesting to redo this evaluation using Comunica 2.x (since a lot has changed internally regarding join query planning). And perhaps, some kind of easily re-runnable pipeline would be valuable as well (e.g. via https://github.com/rubensworks/jbr.js).
There's no way for Quadstore to match Oxigraph.
I would not be so sure:
particularly given work ongoing on https://github.com/belayeng/quadstore/issues/115 .
To further the comparison, one might even want to convert RDF/JS expression to SPARQL queries and run them using Oxigraph to have a "Communica + Oxigraph" system in the benchmark. This might allow to get a rough idea of how much of the possible speed difference between plain Oxigraph and "Communica + Quadstore" is linked to the SPARQL evaluator part and how much is linked to the storage part.
Would be interesting to redo this evaluation using Comunica 2.x (since a lot has changed internally regarding join query planning). And perhaps, some kind of easily re-runnable pipeline would be valuable as well (e.g. via https://github.com/rubensworks/jbr.js).
It would be amazing!
Preliminary performance comparison:
const { strictEqual } = require('assert');
const oxigraph = require('oxigraph');
const { Engine } = require('quadstore-comunica');
const { Quadstore } = require('quadstore');
const { DataFactory } = require('rdf-data-factory');
const { ClassicLevel } = require('classic-level');
const QTY = 1e5;
const dataFactory = new DataFactory();
const time = async (fn, name) => {
const before = Date.now();
await Promise.resolve(fn());
const after = Date.now();
console.log(`${name}: ${after - before} ms`);
};
const main = (fn) => {
Promise.resolve(fn()).catch((err) => {
console.error(err);
process.exit(1);
});
};
main(async () => {
const oxistore = new oxigraph.Store();
const quadstore = new Quadstore({
dataFactory,
backend: new ClassicLevel('./.quadstore.leveldb'),
});
await quadstore.open();
await quadstore.clear();
const engine = new Engine(quadstore);
await time(async () => {
for (let i = 0; i < QTY; i += 1) {
oxistore.add(oxigraph.triple(
oxigraph.namedNode('http://ex/s'),
oxigraph.namedNode('http://ex/p'),
oxigraph.literal(`${i}`),
));
}
}, 'oxigraph - write');
await time(async () => {
let count = 0;
for (const binding of oxistore.query('SELECT * WHERE { ?s ?p ?o }')) {
count += 1;
}
strictEqual(count, QTY, 'bad count');
}, 'oxigraph - sequential read');
await time(async () => {
for (let i = 0; i < QTY; i += 1) {
await quadstore.put(dataFactory.quad(
dataFactory.namedNode('http://ex/s'),
dataFactory.namedNode('http://ex/p'),
dataFactory.literal(`${i}`),
));
}
}, 'quadstore - write');
await time(async () => {
let count = 0;
await engine.queryBindings('SELECT * WHERE { ?s ?p ?o }').then((iterator) => {
return new Promise((resolve, reject) => {
iterator.on('data', (binding) => {
count += 1;
}).once('end', resolve);
});
});
strictEqual(count, QTY, 'bad count');
}, 'quadstore - sequential read');
});
yields:
oxigraph - write: 5332 ms
oxigraph - sequential read: 1981 ms
quadstore - write: 2670 ms
quadstore - sequential read: 585 ms
@Tpt am I reading from oxigraph correctly?
As requested from @jacoscaz - running on a dell XPS 15 9520 with 16GB of ram
$ node dist/oxigraph.js
oxigraph - write: 14231 ms
oxigraph - sequential read: 3895 ms
quadstore - write: 12584 ms
quadstore - sequential read: 1682 ms
@Tpt am I reading from oxigraph correctly?
Hi! Yes!
The results are not surprising to me. JS <-> Oxigraph WASM conversions are very slow so Oxigraph compiled to WASM is only competitive if a lot of computations happen inside of the WASM code. It is not the case with this benchmark.
Added a couple of tests that should bypass the SPARQL layer in both quadstore and oxigraph (although the latter doesn't seem to support streaming, so we're getting all quads in one invocation of match()
).
oxigraph - write: 6640 ms
oxigraph - sequential read: 1966 ms
oxigraph - sequential read w/o SPARQL (no streaming): 147 ms
quadstore - write: 2402 ms
quadstore - sequential read: 537 ms
quadstore - sequential read w/o SPARQL: 116 ms
@Tpt interesting!
The results are not surprising to me. JS <-> Oxigraph WASM conversions are very slow so Oxigraph compiled to WASM is only competitive if a lot of computations happen inside of the WASM code. It is not the case with this benchmark.
Do you happen to have some rough quad/sec numbers at hand when it comes to doing what follows while using Oxigraph from Rust with the RocksDB backend?
SELECT * FROM { ?s ?p ?o }
Do you happen to have some rough quad/sec numbers at hand when it comes to doing what follows while using Oxigraph from Rust with the RocksDB backend?
Sure, here it is on my laptop (min, median, max):
oxigraph native - write: [1.0725 s 1.0747 s 1.0771 s]
oxigraph native - sequential read without SPARQL: [74.356 ms 74.703 ms 75.216 ms]
oxigraph native - read with SPARQL: [100.54 ms 101.31 ms 102.19 ms]
Here is the bench source code: https://gist.github.com/Tpt/1805ff8cdca00baa3ddb941c84a21894
Even more interesting! Oxigraph seems to write roughly 2x faster than quadstore, read 1.5x faster and evaluate SPARQL 5x faster. This is a lot better than I had hoped for already. We should def. trade notes on our (de)serialization strategies!
Mh... Actually, that is assuming our machines are roughly equivalent. Could you run the JS-side comparison between quadstore and oxigraph on your own machine, just to make sure we get an apples-to-apples comparison? The bench lives at https://github.com/belayeng/quadstore-perf and can be run via:
git clone https://github.com/belayeng/quadstore-perf
cd quadstore-perf
npm install
npm run build
node dist/oxigraph.js
Also, the rocks-level
package isn't ready, yet, so we're actually comparing quadstore on LevelDB against oxigraph on RocksDB. Last time I checked with the previous generation of level packages, though, write performance was within 10% of one another.
Sure. Here are my results:
oxigraph - write: 11651 ms
oxigraph - sequential read: 2188 ms
oxigraph - sequential read w/o SPARQL (no streaming): 377 ms
quadstore - write: 6655 ms
quadstore - sequential read: 1005 ms
quadstore - sequential read w/o SPARQL: 320 ms
Your machine seems much faster than mine.
There is also the discrepency that Oxigraph in WASM is fully in memory while quadstore is backed on Disk. Native Oxigraph provides the two modes. Here are the results on the same machine (writes are nearly twice slower on disk but reads are fairly similar. The workload is likely small enough to make RocksDB keep everything inside of the in-memory cache):
On disk (SSD):
oxigraph native disk write: [1.9115 s 1.9184 s 1.9257 s]
oxigraph native disk - sequential read without SPARQL: [78.330 ms 78.748 ms 79.235 ms]
oxigraph native disk - read with SPARQL: [103.25 ms 103.91 ms 104.67 ms]
In memory:
oxigraph native memory - write: [1.0725 s 1.0747 s 1.0771 s]
oxigraph native memory - sequential read without SPARQL: [74.356 ms 74.703 ms 75.216 ms]
oxigraph native memory - read with SPARQL: [100.54 ms 101.31 ms 102.19 ms]
My wet finger guess of the speed difference are:
The comparison I am most interested here, actually, is native Oxigraph on disk (Rust, RocksDB) vs. Quadstore on disk (Node, LevelDB). Quoting from your comments above with slight modifications for clarity, the numbers on your machine should be:
oxigraph native disk write: [1.9115 s 1.9184 s 1.9257 s]
oxigraph native memory - sequential read without SPARQL: [78.330 ms 78.748 ms 79.235 ms]
oxigraph native memory - read with SPARQL: [103.25 ms 103.91 ms 104.67 ms]
and
quadstore - write: 6655 ms
quadstore - sequential read without SPARQL: 320 ms
quadstore - read with SPARQL: 1005 ms
This is extremely helpful as it gives me a reference point to aim for when it comes to what can be achieved with a LevelDB-ish backend and an in-memory SPARQL evaluation pipeline. Due to the higher-level nature of the JS runtime it would be futile to try and match oxigraph's native performance but, in the absence of dramatic performance jumps due to major internal changes, I should at least keep within 3x of Oxigraph native for writes and seq. reads and 10x for SPARQL eval., lowering the gap as much as possible.
Closing this as we now have a dedicated test in the quadstore-perf repo and some ideas as to where we stand in comparison to Oxigraph native. Thanks @Tpt !
@Tpt interesting what happens when running the dist/oxigraph.js
in quadstore-perf with both node.js and bun:
Seems like the cost of the rust -> js
value conversion is significantly lower in bun.
Thank you! This is an interesting finding.
Oxigraph (https://github.com/oxigraph/oxigraph) is a graph database implementing the SPARQL standard built in Rust and using RocksDB as its storage backend. It comes with JavaScript bindings, which make it usable in Node.js too (https://www.npmjs.com/package/oxigraph) although storage is limited to in-memory. Given the use of a lower-level language with an in-memory backend, I expect it to be significantly faster than quadstore itself.