mtth / avsc

Avro for JavaScript :zap:
MIT License
1.28k stars 148 forks source link

`BlobDecoder` does not seem to respect logical types #489

Open joscha opened 5 days ago

joscha commented 5 days ago

Given this test:

    class SanitizedEnumLogicalType extends LogicalType {
      static NAME = "sanitized-enum";

      _fromValue(val) {
        return val.replaceAll("_", "-");
      }
      _toValue(val) {
        return val.replaceAll("-", "_");
      }
      _resolve(type) {
        if (
          Avro.Type.isType(
            type,
            "string",
            `logical:${SanitizedEnumLogicalType.NAME}`,
          )
        ) {
          return this._fromValue;
        }
      }
    }

    test.only('resolve', (cb) => {
      const schema = {
        type: "string",
        logicalType: SanitizedEnumLogicalType.NAME,
      };

      const opts = {
        logicalTypes: {
          [SanitizedEnumLogicalType.NAME]: SanitizedEnumLogicalType,
        }
      }

      const t = Type.forSchema(schema, opts);

      const string = "a-b-c";

      const b = t.toBuffer(string);
      const b_ = t.fromBuffer(b);
      assert.equal(b_, string); // this passes

      const strings = [];
      let encoder = new streams.BlockEncoder(t);
      let decoder = new streams.BlockDecoder(opts)
        .on('data', (s) => { strings.push(s); })
        .on('end', () => {
          assert.deepEqual(strings, [string]); // this fails
          cb();
        });
      encoder.pipe(decoder);
      encoder.end(string);      
    })

it shows that the first equals passes (e.g. b_ is "a-b-c"). You can test the case where the decoder doesn't know anything about the logical types via:

const b__ = Type.forSchema(schema).fromBuffer(b);
console.log(b__);

in this case b__ is "a_b_c" as that is what is in buffer b.

Now in case of the rest of the test, both BlockEncoder and BlockDecoder receive the same opts with the logical types, but the BlockDecoder still does not apply the logical type, e.g. this test fails with:

      Uncaught AssertionError [ERR_ASSERTION]: Expected values to be loosely deep-equal:

[
  'a_b_c'
]

should loosely deep-equal

[
  'a-b-c'
]
      + expected - actual

       [
      -  "a_b_c"
      +  "a-b-c"
       ]

because the encoded format applied the logical type transformation but the decoder does not apply the inverse. Am I holding it wrong?

references https://github.com/mtth/avsc/issues/471

mtth commented 4 days ago

logicalTypes is not a valid BlockDecoder constructor option. Can you try with parseHook?

let decoder = new avro.streams.BlockDecoder({
  parseHook: (schema) => avro.Type.forSchema(schema, opts),
})
joscha commented 4 days ago

I am actually using the file encoder/decoder methods, the block decoder was just more easy to show a reproduction of. What's the reason this option is not available? Is there a reason not to add and respect it?

On Tue, Oct 22, 2024, 01:28 Matthieu Monsch @.***> wrote:

logicalTypes is not a valid BlockDecoder constructor option. Can you try with parseHook?

let decoder = new avro.streams.BlockDecoder({ parseHook: (schema) => avro.Type.forSchema(schema, opts),})

— Reply to this email directly, view it on GitHub https://github.com/mtth/avsc/issues/489#issuecomment-2427976444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABN5BRYJCRHNXYHPLVSYBLZ4WLZFAVCNFSM6AAAAABQKREZ5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRXHE3TMNBUGQ . You are receiving this because you authored the thread.Message ID: @.***>

joscha commented 4 days ago
parseHook: (schema) => avro.Type.forSchema(schema, opts),

This works, thank you. Still unexpected though.