mtth / avsc

Avro for JavaScript :zap:
MIT License
1.28k stars 148 forks source link

Support schema evolution without the need of previous schemas #429

Closed romainbrancourt closed 1 year ago

romainbrancourt commented 1 year ago

Hi ! I think there is a missing feature about schema evolution and deserialization. Take for example these two schemas :

// A schema's first version.
const v1 = avro.Type.forSchema({
  name: 'Person',
  type: 'record',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
  ],
});
// The updated version.
const v2 = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
    { name: 'phone', type: ['null', 'string'], default: null },
  ],
});

The updated version is backward compatible therefore the following code should work ?

const avro = require('avsc');

// A schema's first version.
const v1 = avro.Type.forSchema({
  name: 'Person',
  type: 'record',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
  ],
});

// The updated version.
const v2 = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
    { name: 'phone', type: ['null', 'string'], default: null },
  ],
});

const message1 = v1.toBuffer({ name: 'John', age: 32 });

const message2 = v2.toBuffer({ name: 'John', age: 32, phone: 'phone' });

console.log(v1.fromBuffer(message2));

console.log(v2.fromBuffer(message1));

I know the the library has resolvers for this purpose, but first, it requires to provide the previous schema But the avro documentation imply that you don't need it ?

Avro implementation details: Take a look at [ResolvingDecoder](https://github.com/apache/avro/blob/release-1.7.7/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java) in the Apache Avro project to understand how, for data that was encoded with an older schema, Avro decodes that data with a newer, backward-compatible schema.

Secondly if there is a need to be backward compatible for multiple versions, it is necessary to create a resolver per schema pair:

// A schema's first version.
const v1 = avro.Type.forSchema({
  name: 'Person',
  type: 'record',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
  ],
});

// The updated version.
const v2 = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
    { name: 'phone', type: ['null', 'string'], default: null },
  ],
});

// The updated version.
const v3 = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [
    { name: 'name', type: 'string' },
    { name: 'age', type: 'int' },
    { name: 'phone', type: ['null', 'string'], default: null },
    { name: 'surname', type: ['null', 'string'], default: null },
  ],
});

const resolver = v2.createResolver(v1);
const resolver = v2.createResolver(v3);
const resolver2 = v1.createResolver(v2);
const resolver3 = v1.createResolver(v3);
const resolver4 = v1.createResolver(v2);
etc...
romainbrancourt commented 1 year ago

Never mind :D After re reading the avro documentation it's specified that you need to have the writer schema to decode old message