spring-cloud / spring-cloud-schema-registry

A schema registry implementation for Spring Cloud Stream
47 stars 28 forks source link

AvroSchemaRegistryClientMessageConverter does not allow versionedSchema regex to be overridden #33

Open palmski opened 4 years ago

palmski commented 4 years ago

Describe the issue The regex for versionedSchema in AvroSchemaRegistryClientMessageConverter has an expectation that the subject part of the schema is alphanumeric and does not allow for the fact that the Confluent schema registry is case sensitive, whereas MimeType converts to lowercase.

We have implemented a custom org.springframework.cloud.stream.schema.avro.SubjectNamingStrategy which is driven by enterprise requirements to have a certain prefix to the registered schema name, which contains non-alphanumeric characters and some mixed case formatting.

For example a schema named by default as "foobar" would be registered as SharedKafka_1234.foobar-value. When converted to a MimeType this is application/vnd.sharedkafka_1234.foobar-value.v4+avro which fails the regex check in AvroSchemaRegistryClientMessageConverter here:

private SchemaReference extractSchemaReference(MimeType mimeType) {
        SchemaReference schemaReference = null;
        Matcher schemaMatcher = this.versionedSchema.matcher(mimeType.toString());
        if (schemaMatcher.find()) {
            String subject = schemaMatcher.group(1);
            Integer version = Integer.parseInt(schemaMatcher.group(2));
            schemaReference = new SchemaReference(subject, version, AVRO_FORMAT);
        }
        return schemaReference;
    }

meaning the schema version is never extracted. Furthermore even if the schema reference can be extracted, it would subsequently fail a lookup in the schemaRegistryClient due to the case sensitivity issue.

This prevents us evolving our schemas, as the local schema is then used, which is incompatible with the incoming message

To Reproduce

  1. Register a schema version "n" with a custom SubjectNamingStrategy which includes a non-alphanumeric character
  2. Evolve the schema to "n+1" by adding an optional field
  3. Produce a message using a custom SubjectNamingStrategy which includes a non-alphanumeric character with schema version "n+1"
  4. Attempt to consume the message with a consumer using schema version "n"
  5. Observe stacktrace, similar to
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 24
    at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:170) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:170) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) ~[avro-1.9.0.jar:1.9.0]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) ~[avro-1.9.0.jar:1.9.0]
    at org.springframework.cloud.stream.schema.avro.AbstractAvroMessageConverter.convertFromInternal(AbstractAvroMessageConverter.java:105) ~[spring-cloud-stream-schema-2.1.3.RELEASE.jar:2.1.3.RELEASE]

Version of the framework 2.1.3-RELEASE Expected behavior The regex is overrideable, and the schema registry client takes case into account case-sensitivity (due to MimeType restrictions and Confluent's case sensitivity)

Screenshots

Additional context Add any other context about the problem here.