Open yeralin opened 4 years ago
org.apache.avro.Schema
says
A parser for JSON-format schemas. Each named schema parsed with a parser is added to the names known to the parser so that subsequently parsed schemas may refer to it by name.
There is also a method: https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/Schema.Parser.html#addTypes(java.util.Map)
I see two ways of solving this:
1. Create a map of Parsers based on namespaces
For ex., in my case when I submitted test.sub
, it would have created
PARSERS_MAP.put("test", new Parser());
PARSERS_MAP.get("test").parse(test_sub_definition);
then for the test.main
, simply PARSERS_MAP.get("test").parse(test_main_definition);
.
Drawback: the parsers map should be populated at server boot, if we have a lot of schemas, the boot time would be much longer.
2. Extract any non-avro types from a definition OR catch SchemaParseException
with Undefined name
, and try to locate it in the repository.
If it is found, use above-mentioned addTypes
, otherwise throw InvalidSchemaException
.
UPD: location of a schema might not be possible since we can only search based on the subject
where schema name might differ from the subject it was persisted under.
UPD2: That led me to a third solution:
3. If a user is POSTing a schema that references another schema that was already persisted, one should pass a specific header with the subject name(s) of a referenced schema.
For ex., I persisted my test.sub
schema under A
subject.
Then when I try to persist my test.main
schema, I should pass some header:
Schema-Reference: A;
(possible to post multiple comma separated subjects).
or with version Schema-Reference: A+v2;
On the backend side, we would have something like:
Parser avroParser = new Parser();
for (String subj : schemaReferenceHeader) {
schema = schemaRepo.find(subj);
avroParser.add(schema);
}
avroParser.parse(definition);
This seems not to have any drawbacks.
@sobychacko what do you think about this? And my PR
Hi @yeralin , thanks for exploring the schema cross referencing use cases!
Suggested third approach to use using http header do express the Schema dependencies, feels like a plausible workaround. But a consistent solution would require extending Schema model with an additional dependencies
field. Later will explicitly cross reference the dependent Schemas, using the subject
name, version
and format
as a reference.
In you PR you already extend the Schema
model with a List<Schema> references
. But i guess this field should be of type List<SchemaReferece>
or similar? The goals is to register related Schemas separately and references them not contain them?
Also given that the Schema model is extended with a schema references field, what is the role of the reference header?
@yeralin: Do you have any thoughts to share here? Let us know what you think.
Hey guys, I am really sorry. Got stuck in Peru due to the gvmnt closing its borders lol. Trying to get back to the US.
Will try to work on this ASAP.
@yeralin: Thank you for your contributions so far. Please take your time, and be safe.
AVRO supports multi-schema reference i.e.:
I deployed
spring-cloud-schema-registry
, and was successful at POSTingtest.sub
schema, but when I try to submittest.main
I receive:Invalid Schema: Undefined name: "test.sub"
Which is expected, since there is probably no internal schema resolution (could be a cool feature btw). As for now, is it possible to set some header/flag/etc to avoid this exception?
UPD: I looked at the code, it doesn't seem like I can surpass the validation.