spring-cloud / spring-cloud-schema-registry

A schema registry implementation for Spring Cloud Stream
47 stars 27 forks source link

AvroSchemaRegistryClientMessageConverter incorrectly resolves writer schema #12

Closed mberastau closed 4 years ago

mberastau commented 5 years ago

At the moment if a reader schema is defined the converter will always use it as a writer schema, ignoring the actual writer schema. That leads to violation of the Avro schema resolution (https://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution) and as a result - an exception.

From org.springframework.cloud.stream.schema.avro.AvroSchemaRegistryClientMessageConverter:

protected Schema resolveWriterSchemaForDeserialization(MimeType mimeType) {
    if (this.readerSchema == null) {
        ...
        String schemaContent = this.schemaRegistryClient.fetch(schemaReference);
        ...
        return parsedSchema.getSchema();
    }
    return this.readerSchema;
}

Example If the producer changes its schema from v1 to v2 by introducing a new field, then, according to the schema resolution, this field should be ignored (assuming the consumer's schema hasn't changed and stayed v1):

if the writer's record contains a field with a name not present in the reader's record, the writer's value for that field is ignored.

With the current implementation the consumer app won't event reach the schema resolution step - it will fail during deserialization, because it will not be able to correctly read binary data of a message v2 using the schema v1.

Potential fix would be to unwrap the 1st condition code and always try to resolve the writer schema through schema registry first and only then, if not successful, default to the reader one.

olegz commented 5 years ago

The schema registry is no longer part of spring-cloud-stream project and is managed independently

tzolov commented 4 years ago

Hi @mberastau, thanks for rising this case!

It appears that the purpose of the readerSchema property is indeed to override the normal schema lookup process and always use the hardcoded schema value.

spring.cloud.schema.avro.readerSchema Avro compares schema versions by looking at a writer schema (origin payload) and a reader schema (your application payload). See the Avro documentation for more information. If set, this overrides any lookups at the schema server and uses the local schema as the reader schema.

I'm trying to find out the reasons that brought in the readerSchema. Could you please elaborate your use case? Why would you need hardcoded readerSchema?

tzolov commented 4 years ago

Ok, i have much better understanding about the role of setting explicitly the readerSchema.
After further exploring the role of the readerSchema and putting together tests to verify the issue, I can confirm that this is indeed a miss-behavior and I agree with the @mberastau suggestion:

unwrap the 1st condition code and always try to resolve the writer schema through schema registry first and only then, if not successful, default to the reader one.

The PR is coming soon.