rubensworks / rdf-isomorphic.js

Determines if two RDF graphs are isomorphic
MIT License
8 stars 2 forks source link

Performance: Isomorphism check is slow on 30_000 triples #38

Open jeswr opened 1 year ago

jeswr commented 1 year ago

To check that 30_000 triples are isomorphic to itself takes 12min even though parsing the file only takes 30ms (found by running https://github.com/jeswr/rdf-iso-test/blob/master/test.ts).

I appreciate that the algorithm for checking isomorphism is probably in the realm of squared to exponential complexity, but I think that there are optimizations that could be made here to reduce the complexity class such as:

  1. Loading the graphs into a store so that we can do index lookups rather than iterating through all of the quads in places like hashTerm.

  2. All the string and term comparisons could then also be replaced by equality checks between ids.

  3. The signature can then be in terms of ids of the form s_p_o_g and the sha1hex can probably then be removed.

cc @josd - this is what was causing what seemed like a halt in the eye-js test suite.

rubensworks commented 1 year ago

Indeed, the computational complexity for isomorphism checking is pretty high. Some optimizations may be possible indeed!

RDF canonicalization may perform better for very large graphs. (not fully sure what the computational complexity of it is)