jacoscaz / quadstore

A LevelDB-backed graph database for JS runtimes (Node.js, Deno, browsers, ...) supporting SPARQL queries and the RDF/JS interface.
https://github.com/jacoscaz/quadstore
MIT License
197 stars 14 forks source link

base64Binary literals may not deserialise correctly with classic-level #152

Closed gsvarovsky closed 1 year ago

gsvarovsky commented 2 years ago

e.g.:

dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary'))

does not deserialise correctly when retrieved in classic-level.

Test case (patch to existing quadstore.prototype.get.js test):

Index: test/quadstore.prototype.get.js
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/quadstore.prototype.get.js b/test/quadstore.prototype.get.js
--- a/test/quadstore.prototype.get.js   (revision 78fd616f7b55d70e8c89733fd7e6c223d5b1cc66)
+++ b/test/quadstore.prototype.get.js   (date 1663081452226)
@@ -54,6 +54,12 @@
           dataFactory.literal('Hello, World', 'en-us'),
           dataFactory.namedNode('ex://c4'),
         ),
+        dataFactory.quad(
+          dataFactory.namedNode('ex://s5'),
+          dataFactory.namedNode('ex://p5'),
+          dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary')),
+          dataFactory.namedNode('ex://c5'),
+        ),
       ];
       await this.store.multiPut(this.quads);
     });
@@ -287,6 +293,14 @@
       should(quads).have.length(0);
     });

+    it('should retrieve base64 object', async function () {
+      const { dataFactory, store } = this;
+      const { items: quads } = await store.get({
+        subject: dataFactory.namedNode('ex://s5'),
+      });
+      should(quads).be.equalToQuadArray([this.quads[7]]);
+    });
+
   });

   describe('Quadstore.prototype.get() w/ order', () => {
gsvarovsky commented 1 year ago

I have a suite of "compliance" tests, like system tests, in which one sub-suite uses asymmetric cryptography to sign and verify messages between clones. That involves storing public keys in the data. All those tests fail because the public keys are corrupted when retrieved. That's what led me to write the quadstore test included in the ticket. No other tests fail – they whizz through!

In more raw terms, I write this:

Literal {
  termType: 'Literal',
  value: 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary'
  }
}

but get back this:

Literal {
  termType: 'Literal',
  value: 'd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB\x00\x00',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary\x00\x00MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQs'
  }
}
gsvarovsky commented 1 year ago

I'd be more than happy to look into this myself, if you like!

jacoscaz commented 1 year ago

Hello! Apologies for the latency, got married this weekend :). I'm still looking into this regression, for which we need specific tests. Before my break I did manage to discover that this seems to be related to any term that serializes to a string longer than a specific amount of chars. The type of the term itself appears to make no difference.

jacoscaz commented 1 year ago

Here's two value arrays: the first one related to a breaking test case in which I'm serializing a literal having length of 128 chars, the second one related to a passing test case in which I'm serializing the same literal minus one character.

Uint16Array(17) [0, 15, 0, 15, 3, 49135, 189,  0,  15, 0, 0, 0, 0, 0, 0, 0, 0]
Uint16Array(16) [0, 15, 0, 15, 3,   127,   0, 15,   0, 0, 0, 0, 0, 0, 0, 0]

The array even changes in length (!!!). Weird.

jacoscaz commented 1 year ago

This doesn't happen with the memory-level backend. Also, tests reproducing the serialization steps for a single term seem to pass with flying colors:

it('Should serialize and deserialize a single generic literal term having length of 2048', async function () {
      const { store: { dataFactory: factory }, prefixes } = this;
      const term = factory.literal(''.padStart(2048, 'abab'));
      const value = new Uint16Array(4);
      const serialized = termWriter.write(value, 0, term, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      // I/O to/from backend here
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = termReader.read(serialized, 0, value_16, 0, factory, prefixes);
      should(deserialized.equals(term)).be.true();
    });
jacoscaz commented 1 year ago

This also doesn't happen when reproducing the steps for (de)serializing a single quad:

it('Should serialize and deserialize a quad having terms longer than 127 chars', async function () {
      const { store: { dataFactory: factory, separator }, prefixes } = this;
      const quad = factory.quad(
        factory.namedNode('http://ex.com/s'),
        factory.namedNode('http://ex.com/p'),
        factory.literal(''.padStart(128, 'abab')),
        factory.namedNode('http://ex.com/g'),
      );
      const value = new Uint16Array(16);
      const prefix = `SPOG${separator}`;
      const termNames = ['subject', 'predicate', 'object', 'graph'];
      const serialized = quadWriter.write(prefix, value, 0, quad, termNames, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = quadReader.read(serialized, prefix.length, value_16, 0, termNames, factory, prefixes);
      console.log(deserialized);
      should(deserialized.equals(quad)).be.true();
    });
jacoscaz commented 1 year ago

@gsvarovsky could you please test again with version 11.0.3-beta.1 which I've just published to NPM?

gsvarovsky commented 1 year ago

m-ld-js Unit tests all pass m-ld-js Compliance tests all pass (and in record time)

awesome work. 🫡

jacoscaz commented 1 year ago

All right! Will add some more tests and publish a patch version. Thank you for flagging this, it was a major regression.

jacoscaz commented 1 year ago

Version 11.0.3 is now on NPM!