gs1 / GS1DigitalLinkCompressionPrototype

Experimental prototype for reversible lossless compression of GS1 Digital Link URIs
Apache License 2.0
8 stars 3 forks source link

Compression/Decompression and custom Path #2

Open clementh59 opened 4 years ago

clementh59 commented 4 years ago

Hi Mark,

I've tried to compress and decompress a Digital Link URI with a custom path and I had a problem.

Here is what I'm doing :

const input = 'https://example.com/some/other/path/info/01/09780345418913/21/12345';
const compress = Utils.compressWebUri(input);
const uncompress = Utils.decompressWebUri(compress);
expect(input).to.equal(uncompress);

But uncompress equals https://example.com/01/09780345418913/21/12345

Here are my compress and decompress functions :

const decompressWebUri = (uri, useShortText = false) =>
  toolkit.decompressGS1DigitalLink(uri, useShortText, getUriStem(uri));

const compressWebUri = (uri, useOptimisations = true, compressOtherKeyValuePairs = true) => {
  const uncompressedPrimary = false;
  const useShortText = false;

  return toolkit.compressGS1DigitalLink(
    uri,
    useShortText, // Not used
    getUriStem(uri),
    uncompressedPrimary, // Not used
    useOptimisations,
    compressOtherKeyValuePairs,
  );
};

Do you know where the issue comes from?

Thanks

mgh128 commented 4 years ago

Hi Clément,

I think the problem is that the compression algorithm is not preserving the URI path info before the part that is specific to GS1 Digital Link, so /some/other/path/info is being lost in the round-trip. You can verify this at https://gs1.github.io/GS1DigitalLinkCompressionPrototype/ where you'll see that

https://example.com/some/other/path/info/01/09780345418913/21/12345

is compressed to

http://example.org/ARHKVAdpQkIKMDk

rather than

http://example.org/some/other/path/info/ARHKVAdpQkIKMDk

The good news is that the decompression algorithm will currently decompress

http://example.org/some/other/path/info/ARHKVAdpQkIKMDk to

http://example.org/gtin/09780345418913/ser/12345

so with some relatively minor adjustment of the decompression algorithm, we can arrange for it to decompress to

https://example.com/some/other/path/info/gtin/09780345418913/ser/12345 or https://example.com/some/other/path/info/01/09780345418913/21/12345

The GS1 Digital Link compression algorithm currently starts with an associative array of GS1 Application Identifiers and their values - it does not currently start with an uncompressed GS1 Digital Link URI.

If you feel that this is a major problem, please mention this in the GS1 Digital Link v1.2 work group - but it would need some work to make the adjustment to the initial compression flowchart to start with an uncompressed GS1 Digital Link URI and a corresponding adjustment to the decompression flowcharts.

clementh59 commented 4 years ago

Thanks for your answer.

I'll correct this issue in digital-link.js by adding the custom path directly after the compression and decompression.

domguinard commented 4 years ago

I think this should indeed be discussed in the group @mgh128 as it might be something the current description of the algorithm is missing, the issue is not so much that the custom path is not compressed (I do not think it would make sense) but that the compression algorithm's output will truncate it and hence is lossy.

mgh128 commented 4 years ago

Yes, we should have work group discussion on this, possibly even today.

The current starting point for the compression algorithm is an associative array / object / map of AIs and their values, rather than the uncompressed GS1 DL URI, so it looks like we'd need one extra flowchart or a modification to existing flowcharts to ensure that any preceding path info is preserved without compression.

We'd also need to check the grammar for a compressed GS1 DL URI to ensure that it can tolerate path info before the compressed string as the final component of URI path information.

On Thu, 29 Oct 2020, 10:45 Dominique Guinard, notifications@github.com wrote:

I think this should indeed be discussed in the group @mgh128 https://github.com/mgh128 as it might be something the current description of the algorithm is missing, the issue is not so much that the custom path is not compressed (I do not think it would make sense) but that the compression algorithm's output will truncate it and hence is lossy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gs1/GS1DigitalLinkCompressionPrototype/issues/2#issuecomment-718634311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSXRL3WQ3M6RIKASIZATKDSNFBS7ANCNFSM4TBBQVYA .