Referencing other schemas

yeralin commented 4 years ago

AVRO supports multi-schema reference i.e.:

{
    "type": "record",
    "namespace": "test",
    "name": "sub",
    "fields": [
        {
            "name": "state",
            "type": "string"
        }
    ]
}

{
    "type": "record",
    "namespace": "test",
    "name": "main",
    "fields": [
        {
            "name": "sub",
            "type": "test.sub"
        }
    ]
}

I deployed spring-cloud-schema-registry, and was successful at POSTing test.sub schema, but when I try to submit test.main I receive: Invalid Schema: Undefined name: "test.sub"

Which is expected, since there is probably no internal schema resolution (could be a cool feature btw). As for now, is it possible to set some header/flag/etc to avoid this exception?

UPD: I looked at the code, it doesn't seem like I can surpass the validation.

yeralin commented 4 years ago

org.apache.avro.Schema says

A parser for JSON-format schemas. Each named schema parsed with a parser is added to the names known to the parser so that subsequently parsed schemas may refer to it by name.

There is also a method: https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/Schema.Parser.html#addTypes(java.util.Map)

I see two ways of solving this:

1. Create a map of Parsers based on namespaces For ex., in my case when I submitted test.sub, it would have created

PARSERS_MAP.put("test", new Parser());
PARSERS_MAP.get("test").parse(test_sub_definition);

then for the test.main, simply PARSERS_MAP.get("test").parse(test_main_definition);.

Drawback: the parsers map should be populated at server boot, if we have a lot of schemas, the boot time would be much longer.

2. Extract any non-avro types from a definition OR catch SchemaParseException with Undefined name, and try to locate it in the repository.

If it is found, use above-mentioned addTypes, otherwise throw InvalidSchemaException.

UPD: location of a schema might not be possible since we can only search based on the subject where schema name might differ from the subject it was persisted under.

UPD2: That led me to a third solution:

3. If a user is POSTing a schema that references another schema that was already persisted, one should pass a specific header with the subject name(s) of a referenced schema.

For ex., I persisted my test.sub schema under A subject. Then when I try to persist my test.main schema, I should pass some header: Schema-Reference: A; (possible to post multiple comma separated subjects). or with version Schema-Reference: A+v2;

On the backend side, we would have something like:

Parser avroParser = new Parser();
for (String subj : schemaReferenceHeader) {
    schema = schemaRepo.find(subj);
    avroParser.add(schema);
}
avroParser.parse(definition);

This seems not to have any drawbacks.

yeralin commented 4 years ago

@sobychacko what do you think about this? And my PR

tzolov commented 4 years ago

Hi @yeralin , thanks for exploring the schema cross referencing use cases!

Suggested third approach to use using http header do express the Schema dependencies, feels like a plausible workaround. But a consistent solution would require extending Schema model with an additional dependencies field. Later will explicitly cross reference the dependent Schemas, using the subject name, version and format as a reference.

In you PR you already extend the Schema model with a List<Schema> references. But i guess this field should be of type List<SchemaReferece> or similar? The goals is to register related Schemas separately and references them not contain them?

Also given that the Schema model is extended with a schema references field, what is the role of the reference header?

sabbyanandan commented 4 years ago

@yeralin: Do you have any thoughts to share here? Let us know what you think.

yeralin commented 4 years ago

Hey guys, I am really sorry. Got stuck in Peru due to the gvmnt closing its borders lol. Trying to get back to the US.

Will try to work on this ASAP.

sabbyanandan commented 4 years ago

@yeralin: Thank you for your contributions so far. Please take your time, and be safe.

spring-cloud / spring-cloud-schema-registry

Referencing other schemas #29