iherman / canonical_rdf

Proof-of-concept implementation of Aidan Hogan's RDF canonicalization algorithm in node.js
Other
7 stars 1 forks source link

Run tests against RDF normalization test suite #2

Open dlongley opened 6 years ago

dlongley commented 6 years ago

Hi @iherman! Thanks for doing this PoC!

Do you think you could run this implementation against some of the tests from the RDF normalization test suite? The expected output will be different of course from the existing algorithms, but a number of the tests relabel blank nodes in isomorphic datasets such that the same output should be generated.

It would be especially interesting to see how it does with the "evil" tests:

https://github.com/json-ld/normalization/blob/gh-pages/tests/test044-in.nq https://github.com/json-ld/normalization/blob/gh-pages/tests/test045-in.nq https://github.com/json-ld/normalization/blob/gh-pages/tests/test046-in.nq

Each of these should produce the same output.

iherman commented 6 years ago

@dlongley this is mostly an ack to your message; I am currently on a trip, ie, I cannot do anything more seriously. I had actually copied your full test suite on my machine already, but I did not have yet the time to systematically run my version on all of them.

That being said: these mean and evil examples seem to break my implementation, that much is sure. I do not know whether it is a problem with the algorithm or my implementation; most probably the latter...

Thanks.

iherman commented 6 years ago

Hey @dlongley: it needed some time after my return. Actually, your evil tests have revealed two very nasty bugs, one related to a fix I did while I was on my trip (never a good idea!!). But I found them and updated the repository.

The result is yes, those three pass and produce the same canonical graph. It obviously goes through a longer search so it takes 4-5 seconds on my machine. The paper of Aidan has an optimization steps that does not change the mathematics, but speeds up things significantly, but I did not implement it (so far?).

B.t.w., Aidan also ran these tests on his (Java) implementations, with proper results; his implementation includes the optimization, of course, and they take (per his email) about 100ms.

dlongley commented 6 years ago

@iherman,

That's good. We should, at some point, benchmark the various implementations (of this algorithm and the others) against the test suite to see what the landscape looks like. I think the current JavaScript implementation (of URDNA2015) handles the evil tests in ~40ms or less -- but we don't know what the hardware characteristics are here on the various machines so we can't yet figure out a real comparison or determine what's going on. I'm also aware that some significant optimizations could be made to the JavaScript implementation.

I think it's more important that the implementations do a good job with the common cases, of course. It sounds like we could start producing some useful data here though!

aidhog commented 6 years ago

Hey all, just to mention that if needed, I could contribute some more difficult synthetic test cases (e.g., where the blabel algorithm I implemented fails). I'm not sure how important such cases are in practice or for the definition of a standard, but let me know if such tests would be of interest. They are essentially some of the difficult cases from the graph isomorphism community. (And yes, they are very weird.)

dlongley commented 6 years ago

@aidhog,

Yes, those would be of interest. If you could, please contribute them to the test suite so we can at least evaluate them and how the implementations react when they are encountered.

aidhog commented 6 years ago

@dlongley, okay! So I have 216 synthetic test cases for varying sizes of the following classes of graph:

How would we proceed? I would be reluctant to just dump them into the current test suite since I guess the current examples are selected with use-cases in mind? Perhaps they should be uploaded separately elsewhere? (Also they are currently .nt rather than .nq.)

aidhog commented 6 years ago

They are not so large, so I can attach them here I guess.

eval-rdf.zip

These were selected and "converted" to RDF from benchmarks for Bliss. Credit to Petteri Kaski for creating the original undirected graphs.

iherman commented 6 years ago

@dlongley

That's good. We should, at some point, benchmark the various implementations (of this algorithm and the others) against the test suite to see what the landscape looks like. I think the current JavaScript implementation (of URDNA2015) handles the evil tests in ~40ms or less -- but we don't know what the hardware characteristics are here on the various machines so we can't yet figure out a real comparison or determine what's going on. I'm also aware that some significant optimizations could be made to the JavaScript implementation.

I agree. Just to make it clear: as I said in my readme, I do not consider my implementation to be "production quality": I am sure it would need some profiling (which I did not do) and find the points where it could be speeded up. Let alone the fact that I do not consider myself an expert Javascript programmer, i.e., I am sure I did fell into some pitfalls in this respect...

iherman commented 6 years ago

@aidhog @dlongley I am not sure where to put these tests; it would be good to gather them at one place. I think that the test suite of @dlongley is, obviously, much more extensive and complete (I used some of those, not all reflected in the test suite of this repository). I could see merging the tests of @aidhog into those of @dlongley or, alternatively, I am happy setting up a separate repository where we can collect those. I am neutral...

davidlehn commented 6 years ago

Comments on the "bliss" files:

aidhog commented 6 years ago

@davidlehn, I will respond in order with some thoughts.

Just to emphasise, I do not necessarily think these graphs should necessarily be merged with the existing test-cases; I think they could also be kept separate. Like @iherman , I am neutral and happy to help with whatever the group decides.

kortschak commented 3 years ago

I have implemented this in Go with both the simple approach and with the pre-splitting approach using algorithm 2 in the paper.

I have adopted the suggestion in the readme here for allowing hashing of quad labels. Thanks to @iherman for the excellent notes in that document, and to @aidhog for help dealing with some issues I had initially.

I've implemented a test harness that does a number of things using the test cases in https://github.com/json-ld/normalization/blob/gh-pages/tests (these can be licensed BSD 3-clause). It runs the hash on the original and then compares that with the original with:

  1. reversed statement order
  2. randomly permuted statement order
  3. randomly purmuted statement order
  4. blank labels replaced with a hash of the label text
  5. 1+4
  6. 2+4

Then test to ensure non-collision of different graphs are made, checking that the original does not match a derivative that:

  1. has the last statement removed
  2. has the first non-blank term hound replaced with a literal of the hash of the text throughout the graph
  3. has the first to lexically ordered blank nodes fused into a single blank node

The algorithm as it is written work for all with no pre-splitting but failed for 44, 45 and 46 (the graphs in the OP) when pre-splitting is done. This turns out to be because the two splits that arise have the same hash, but lead to different final hashes. Adding a loop over co-equal hashes in the body of the distinguish function fixes this. 44-46 are good graphs to have in the test suite since they have this characteristic.

If there are other graphs that should be included please suggest.

I have also added benchmarking for a couple of cases (one plain and one evil). ~No~ Some non-obvious optimisation for reduction of allocation/GC pressure has been done ~yet~, which is where the greatest performance improvements are likely to be made.

name                                             time/op
IsoCanonicalHashes/test019-in.nq/decomp=false-8  26.4µs ± 5%
IsoCanonicalHashes/test019-in.nq/decomp=true-8   29.8µs ± 4%
IsoCanonicalHashes/test044-in.nq/decomp=false-8   429ms ± 2%
IsoCanonicalHashes/test044-in.nq/decomp=true-8    441ms ± 2%

name                                             alloc/op
IsoCanonicalHashes/test019-in.nq/decomp=false-8  16.5kB ± 0%
IsoCanonicalHashes/test019-in.nq/decomp=true-8   19.5kB ± 0%
IsoCanonicalHashes/test044-in.nq/decomp=false-8  82.1MB ± 0%
IsoCanonicalHashes/test044-in.nq/decomp=true-8   81.3MB ± 0%

name                                             allocs/op
IsoCanonicalHashes/test019-in.nq/decomp=false-8     215 ± 0%
IsoCanonicalHashes/test019-in.nq/decomp=true-8      249 ± 0%
IsoCanonicalHashes/test044-in.nq/decomp=false-8   1.51M ± 0%
IsoCanonicalHashes/test044-in.nq/decomp=true-8    1.51M ± 0%

For comparison, here is the same benchmark on a freshly minted URDNA2015/URGNA2012 written with the same level of care.

name                                             time/op
URNA/test019-in.nq/URDNA2015-8                   8.52µs ± 6%
URNA/test019-in.nq/URGNA2012-8                   8.06µs ± 8%
URNA/test044-in.nq/URDNA2015-8                   8.13ms ± 5%
URNA/test044-in.nq/URGNA2012-8                   6.46ms ± 2%

name                                             alloc/op
URNA/test019-in.nq/URDNA2015-8                   4.43kB ± 0%
URNA/test019-in.nq/URGNA2012-8                   4.29kB ± 0%
URNA/test044-in.nq/URDNA2015-8                   3.43MB ± 0%
URNA/test044-in.nq/URGNA2012-8                   3.17MB ± 0%

name                                             allocs/op
URNA/test019-in.nq/URDNA2015-8                     80.0 ± 0%
URNA/test019-in.nq/URGNA2012-8                     80.0 ± 0%
URNA/test044-in.nq/URDNA2015-8                    55.3k ± 0%
URNA/test044-in.nq/URGNA2012-8                    55.2k ± 0%

I suspect the difference in performance between the Java and Go implementation is down to the nature of the GC; AFAIU, Java has generational GC which makes recursive algorithm like this work better there. Go does not.

This work is a PR in the Gonum project, gonum/gonum#1524 (currently failing due to bad linting of generated code, but otherwise good).