Compare to Oxigraph - Githubissues

jacoscaz commented 2 years ago

Oxigraph (https://github.com/oxigraph/oxigraph) is a graph database implementing the SPARQL standard built in Rust and using RocksDB as its storage backend. It comes with JavaScript bindings, which make it usable in Node.js too (https://www.npmjs.com/package/oxigraph) although storage is limited to in-memory. Given the use of a lower-level language with an in-memory backend, I expect it to be significantly faster than quadstore itself.

jacoscaz commented 2 years ago

Tagging Oxigraph's creator @Tpt just in case he's interested.

Tpt commented 2 years ago

@jacoscaz Thank you for tagging me! Oxigraph JS is currently compiled to WASM making hard to use the disk for storage. I am currently considering rewriting it as a native NodeJS extension to allow easy and fast disk access.

About performance comparison here is a paper who compared Oxigraph with other JS SPARQL implementations: https://openreview.net/pdf?id=CXLmXMb2TJ (see section 4.3).

jacoscaz commented 2 years ago

@Tpt thank you for that link! I am very curious as to the extent of the performance gap between Oxigraph and Quadstore in the following realms:

pure quad ingestion speed (the RDF/JS Sink#import method method)
SPARQL query evaluation

There's no way for Quadstore to match Oxigraph, of course, but the Node.js bindings make the latter a nice reference point for an apples-to-apples comparison (as opposed to comparing an embedded store to an external one), particularly given work ongoing on #115 .

rubensworks commented 2 years ago

About performance comparison here is a paper who compared Oxigraph with other JS SPARQL implementations: https://openreview.net/pdf?id=CXLmXMb2TJ (see section 4.3).

Would be interesting to redo this evaluation using Comunica 2.x (since a lot has changed internally regarding join query planning). And perhaps, some kind of easily re-runnable pipeline would be valuable as well (e.g. via https://github.com/rubensworks/jbr.js).

Tpt commented 2 years ago

There's no way for Quadstore to match Oxigraph.

I would not be so sure:

on the ingestion side Oxigraph needs to convert JS/V8 values to Rust values and it has a significant overhead (c.f. the paper).
on the query side, there is no "efficient" range-based query support yet in Oxigraph so Quadstore might have an edge there.

particularly given work ongoing on https://github.com/belayeng/quadstore/issues/115 .

To further the comparison, one might even want to convert RDF/JS expression to SPARQL queries and run them using Oxigraph to have a "Communica + Oxigraph" system in the benchmark. This might allow to get a rough idea of how much of the possible speed difference between plain Oxigraph and "Communica + Quadstore" is linked to the SPARQL evaluator part and how much is linked to the storage part.

Would be interesting to redo this evaluation using Comunica 2.x (since a lot has changed internally regarding join query planning). And perhaps, some kind of easily re-runnable pipeline would be valuable as well (e.g. via https://github.com/rubensworks/jbr.js).

It would be amazing!

jacoscaz commented 2 years ago

Preliminary performance comparison:

const { strictEqual } = require('assert');
const oxigraph = require('oxigraph');
const { Engine } = require('quadstore-comunica');
const { Quadstore } = require('quadstore');
const { DataFactory } = require('rdf-data-factory');
const { ClassicLevel } = require('classic-level');

const QTY = 1e5;

const dataFactory = new DataFactory();

const time = async (fn, name) => {
  const before = Date.now();
  await Promise.resolve(fn());
  const after = Date.now();
  console.log(`${name}: ${after - before} ms`);
};

const main = (fn) => {
  Promise.resolve(fn()).catch((err) => {
    console.error(err);
    process.exit(1);
  });
};

main(async () => {

  const oxistore = new oxigraph.Store();

  const quadstore = new Quadstore({
    dataFactory,
    backend: new ClassicLevel('./.quadstore.leveldb'),
  });

  await quadstore.open();
  await quadstore.clear();

  const engine = new Engine(quadstore);

  await time(async () => {
    for (let i = 0; i < QTY; i += 1) {
      oxistore.add(oxigraph.triple(
        oxigraph.namedNode('http://ex/s'), 
        oxigraph.namedNode('http://ex/p'),
        oxigraph.literal(`${i}`),
      ));
    }
  }, 'oxigraph - write');

  await time(async () => {
    let count = 0;
    for (const binding of oxistore.query('SELECT * WHERE { ?s ?p ?o }')) {
        count += 1;
    }
    strictEqual(count, QTY, 'bad count');
  }, 'oxigraph - sequential read');

  await time(async () => {
    for (let i = 0; i < QTY; i += 1) {
      await quadstore.put(dataFactory.quad(
        dataFactory.namedNode('http://ex/s'), 
        dataFactory.namedNode('http://ex/p'),
        dataFactory.literal(`${i}`),
      ));
    }
  }, 'quadstore - write');

  await time(async () => {
    let count = 0;
    await engine.queryBindings('SELECT * WHERE { ?s ?p ?o }').then((iterator) => {
      return new Promise((resolve, reject) => {
        iterator.on('data', (binding) => {
          count += 1;
        }).once('end', resolve);
      });
    });
    strictEqual(count, QTY, 'bad count');
  }, 'quadstore - sequential read');

});

yields:

oxigraph - write: 5332 ms
oxigraph - sequential read: 1981 ms
quadstore - write: 2670 ms
quadstore - sequential read: 585 ms

@Tpt am I reading from oxigraph correctly?

jeswr commented 2 years ago

As requested from @jacoscaz - running on a dell XPS 15 9520 with 16GB of ram

$ node dist/oxigraph.js
oxigraph - write: 14231 ms
oxigraph - sequential read: 3895 ms
quadstore - write: 12584 ms
quadstore - sequential read: 1682 ms

Tpt commented 2 years ago

@Tpt am I reading from oxigraph correctly?

Hi! Yes!

The results are not surprising to me. JS <-> Oxigraph WASM conversions are very slow so Oxigraph compiled to WASM is only competitive if a lot of computations happen inside of the WASM code. It is not the case with this benchmark.

jacoscaz commented 2 years ago

Added a couple of tests that should bypass the SPARQL layer in both quadstore and oxigraph (although the latter doesn't seem to support streaming, so we're getting all quads in one invocation of match()).

oxigraph - write: 6640 ms
oxigraph - sequential read: 1966 ms
oxigraph - sequential read w/o SPARQL (no streaming): 147 ms
quadstore - write: 2402 ms
quadstore - sequential read: 537 ms
quadstore - sequential read w/o SPARQL: 116 ms

jacoscaz commented 2 years ago

@Tpt interesting!

The results are not surprising to me. JS <-> Oxigraph WASM conversions are very slow so Oxigraph compiled to WASM is only competitive if a lot of computations happen inside of the WASM code. It is not the case with this benchmark.

Do you happen to have some rough quad/sec numbers at hand when it comes to doing what follows while using Oxigraph from Rust with the RocksDB backend?

sequential reads from one index, without SPARQL
sequential reads for the query SELECT * FROM { ?s ?p ?o }

Tpt commented 2 years ago

Do you happen to have some rough quad/sec numbers at hand when it comes to doing what follows while using Oxigraph from Rust with the RocksDB backend?

Sure, here it is on my laptop (min, median, max):

oxigraph native - write: [1.0725 s 1.0747 s 1.0771 s]                     
oxigraph native - sequential read without SPARQL: [74.356 ms 74.703 ms 75.216 ms]                                   
oxigraph native - read with SPARQL: [100.54 ms 101.31 ms 102.19 ms]

Here is the bench source code: https://gist.github.com/Tpt/1805ff8cdca00baa3ddb941c84a21894

jacoscaz commented 2 years ago

Even more interesting! Oxigraph seems to write roughly 2x faster than quadstore, read 1.5x faster and evaluate SPARQL 5x faster. This is a lot better than I had hoped for already. We should def. trade notes on our (de)serialization strategies!

Mh... Actually, that is assuming our machines are roughly equivalent. Could you run the JS-side comparison between quadstore and oxigraph on your own machine, just to make sure we get an apples-to-apples comparison? The bench lives at https://github.com/belayeng/quadstore-perf and can be run via:

git clone https://github.com/belayeng/quadstore-perf
cd quadstore-perf
npm install
npm run build
node dist/oxigraph.js

Also, the rocks-level package isn't ready, yet, so we're actually comparing quadstore on LevelDB against oxigraph on RocksDB. Last time I checked with the previous generation of level packages, though, write performance was within 10% of one another.

Tpt commented 2 years ago

Sure. Here are my results:

oxigraph - write: 11651 ms
oxigraph - sequential read: 2188 ms
oxigraph - sequential read w/o SPARQL (no streaming): 377 ms
quadstore - write: 6655 ms
quadstore - sequential read: 1005 ms
quadstore - sequential read w/o SPARQL: 320 ms

Your machine seems much faster than mine.

There is also the discrepency that Oxigraph in WASM is fully in memory while quadstore is backed on Disk. Native Oxigraph provides the two modes. Here are the results on the same machine (writes are nearly twice slower on disk but reads are fairly similar. The workload is likely small enough to make RocksDB keep everything inside of the in-memory cache):

On disk (SSD):

oxigraph native disk write: [1.9115 s 1.9184 s 1.9257 s]
oxigraph native disk - sequential read without SPARQL: [78.330 ms 78.748 ms 79.235 ms]
oxigraph native disk - read with SPARQL: [103.25 ms 103.91 ms 104.67 ms]

In memory:

oxigraph native memory - write: [1.0725 s 1.0747 s 1.0771 s]                     
oxigraph native memory - sequential read without SPARQL: [74.356 ms 74.703 ms 75.216 ms]                                   
oxigraph native memory - read with SPARQL: [100.54 ms 101.31 ms 102.19 ms]

My wet finger guess of the speed difference are:

Rust is a bit faster than JS or WASM on V8 so it gives Oxigraph an edge especially on compute-heavy operations like SPARQL parsing even.
JS <-> Rust on WASM conversions are slow (UTF-8 <-> UTF-16 conversions...).

jacoscaz commented 2 years ago

The comparison I am most interested here, actually, is native Oxigraph on disk (Rust, RocksDB) vs. Quadstore on disk (Node, LevelDB). Quoting from your comments above with slight modifications for clarity, the numbers on your machine should be:

oxigraph native disk write: [1.9115 s 1.9184 s 1.9257 s]
oxigraph native memory - sequential read without SPARQL: [78.330 ms 78.748 ms 79.235 ms]
oxigraph native memory - read with SPARQL: [103.25 ms 103.91 ms 104.67 ms]

and

quadstore - write: 6655 ms
quadstore - sequential read without SPARQL: 320 ms
quadstore - read with SPARQL: 1005 ms

This is extremely helpful as it gives me a reference point to aim for when it comes to what can be achieved with a LevelDB-ish backend and an in-memory SPARQL evaluation pipeline. Due to the higher-level nature of the JS runtime it would be futile to try and match oxigraph's native performance but, in the absence of dramatic performance jumps due to major internal changes, I should at least keep within 3x of Oxigraph native for writes and seq. reads and 10x for SPARQL eval., lowering the gap as much as possible.

jacoscaz commented 2 years ago

Closing this as we now have a dedicated test in the quadstore-perf repo and some ideas as to where we stand in comparison to Oxigraph native. Thanks @Tpt !

jacoscaz commented 8 months ago

@Tpt interesting what happens when running the dist/oxigraph.js in quadstore-perf with both node.js and bun:

Screenshot 2024-02-14 at 10 26 51

Seems like the cost of the rust -> js value conversion is significantly lower in bun.

Tpt commented 8 months ago

Thank you! This is an interesting finding.

quadstorejs / quadstore

Compare to Oxigraph #147