Closed joscha closed 2 months ago
Workaround (not pretty but works):
class DynamicTypeRegistry {
private readonly registry: Record<string, Avro.Type> = {};
private readonly proxy = new Proxy(this.registry, {
get: (target, prop, receiver) => {
if (prop.toString() in this.registry) {
// should use `maybeQualify` here, if it was exported from avsc
return this.registry[prop.toString()];
}
let unqualifiedProp = prop.toString();
const { namespace } = this.opts;
// could use `unqualify` here, if it was exported from avsc
if (namespace && prop.toString().startsWith(namespace + '.')) {
unqualifiedProp = prop.toString().slice(namespace.length + 1);
}
const fileName = `${__dirname}/avro-schema/${unqualifiedProp}.avsc`;
if (!fs.existsSync(fileName)) {
return undefined;
}
// avsc checks whether there is already a truthy entry in the registry and fais if there is one
// but it will go into an infinite loop if we don't set it to undefined
// should use `maybeQualify` here, if it was exported from avsc
this.registry[prop.toString()] = undefined as any;
let jsonData;
try {
jsonData = JSON.parse(fs.readFileSync(fileName, 'utf-8'))
} catch(e) {
console.error('Error reading file', fileName, e);
throw new Error(`Error reading file ${fileName}`);
}
const schema = Avro.Type.forSchema(
jsonData,
{
...this.opts,
registry: this.proxy,
}
)
return schema
}
})
constructor(private readonly opts: Record<string, string> = {}) {
}
get() {
return this.proxy
}
}
and then:
const registry = new DynamicTypeRegistry({
namespace: 'model',
}).get();
const schema = registry['ListEntryWithEntity'];
console.log(schema);
// pass `registry` into more calls to `Avro.Type.forSchema` at your own will - it is compatible with the one expected in `opts`, like so:
Avro.Type.forSchema(..., { registry });
which will recursively load a schema from a set of .avsc
files in a directory, such as ones produced from https://openapi-generator.tech/docs/generators/avro-schema/
Just stumbled across the typeHook
function. I think possibly it can made to be work with that, too, will give it a try in a second, although the code would probably need
--- a/lib/types.js
+++ b/lib/types.js
@@ -138,6 +138,7 @@ class Type {
if (!Type.isType(type)) {
throw new Error(`invalid typehook return value: ${j(type)}`);
}
+ opts.registry[schema] = type;
return type;
}
A workaround could be to write into the registry inside the hook, but I wonder why types resolved by the typeHook
function are not put into the registry in the first place? Is the reason more control for a typeHook provider or is it an oversight?
Hi @joscha. Yes - a typehook is the way to go here.
I wonder why types resolved by the typeHook function are not put into the registry in the first place? Is the reason more control for a typeHook provider or is it an oversight?
Adding the type to the registry is best left to caller. Not all types have a name and even when they do, the caller might want to pick a different one.
Yes - a typehook is the way to go here.
Okay, will give it a try and report back. Would you potentially be open to making qualify
and maybeQualify
public API of this package, so we can reuse them?
Okay, I've given the typeHook
a try. This is an alternative solution to the DynamicTypeRegistry
above:
/**
* TODO(@joscha): reuse this one from `avsc`
*/
function unqualify(type: string) {
return type.replace(/^.*\./, "");
}
function isQualifiedType(type: string) {
return type.includes(".");
}
/**
* TODO(@joscha): reuse this one from `avsc`
*/
function maybeQualify(type: string, namespace?: string) {
return namespace ? `${namespace}.${type}` : type;
}
function loadLocalSchema(
type: string,
opts: Partial<Avro.ForSchemaOptions>,
): Avro.Type | undefined {
if (isQualifiedType(type)) {
if (opts?.namespace && !type.startsWith(opts.namespace + ".")) {
throw new Error(
`Qualified type does not match namespace; don't know how to resolve "${type}"`,
);
}
type = unqualify(type);
} else {
// all our internal named types are namespaced, so this must be a primitive
return;
}
const fileName = `${__dirname}/avro-schema/${type}.avsc`;
let jsonData;
try {
jsonData = JSON.parse(fs.readFileSync(fileName, "utf-8"));
} catch (e) {
console.error("Error reading file", fileName, e);
throw new Error(`Error reading file ${fileName}`);
}
return Avro.Type.forSchema(jsonData, opts);
}
const typeHook: Avro.ForSchemaOptions["typeHook"] = (schema, opts) => {
// interestingly, the `typeHook` definition doesn't suggest that the `schema` parameter can be of type `string`,
// but referenced non-primitives are passed with their `.name` property.
if (typeof schema === "string") {
if (schema in opts.registry) {
return opts.registry[schema];
}
const ret = loadLocalSchema(schema, opts);
if (ret) {
opts.registry[maybeQualify(schema, opts.namespace)] = ret;
}
return ret;
}
};
const opts: Partial<Avro.ForSchemaOptions> = {
typeHook,
namespace: "model",
};
// entry point to load a whole set of connected schemas, starting with a named type `model.Company`
const companySchema = Avro.Type.forSchema("model.Company", opts);
I am fairly happy with it, except it could use:
isPrimitive
unqualify
maybeQualify
how would you feel about exposing these from avsc
@mtth?
Bonus points for:
typeHook
s, then we could change all the synchronous file methods to their async counterparts.typeHook
type definition to include | string
how would you feel about exposing these from avsc
I don't think it's worth increasing avsc
's API surface just yet, but I'll consider exporting them if more use-cases come up.
In the example above, you can avoid the isPrimitive
file existence check if all your schemas are namespaced.
allowing asynchronous typeHooks, then we could change all the synchronous file methods to their async counterparts.
This capability would be great but we would need to maintain two flavors of schema parsing, which is not a trade-off worth making. The general recommendation here is to do all async-friendly work upfront, before schema parsing. In your case, you could preload the files using parallel async calls and cache them for synchronous lookups in the type hook.
fixing the typeHook type definition to include | string
Thanks for flagging, this should get fixed. PR welcome.
how would you feel about exposing these from avsc
I don't think it's worth increasing
avsc
's API surface just yet, but I'll consider exporting them if more use-cases come up.
I think aligning the behaviour of the implementations outside avsc and inside would be great. Otherwise there's always a chance they diverge and suddenly after a bump of avsc
my typehook won't work anymore like it should.
In the example above, you can avoid the
isPrimitive
file existence check if all your schemas are namespaced.
That's true. In my particular case, this is guaranteed, thank you for pointing this out.
allowing asynchronous typeHooks, then we could change all the synchronous file methods to their async counterparts.
This capability would be great but we would need to maintain two flavors of schema parsing, which is not a trade-off worth making. The general recommendation here is to do all async-friendly work upfront, before schema parsing. In your case, you could preload the files using parallel async calls and cache them for synchronous lookups in the type hook.
This assumes I know what schemas I am loading ahead of time. It also prevents the use of ad-hoc generated schemas and/or calls to external services. Whilst promises would be my preferred way, most other parts of avsc
work with callbacks, so I actually think the current implementation of typeHook
is an outlier. Adding an optional callback function shouldn't be too onerous for maintenance?
fixing the typeHook type definition to include | string
Thanks for flagging, this should get fixed. PR welcome.
Here you go: https://github.com/mtth/avsc/pull/475
Otherwise there's always a chance they diverge and suddenly after a bump of avsc my typehook won't work anymore like it should.
Namespacing is defined by the specification, so unlikely to diverge in breaking ways.
I agree that it would be convenient for your use-case to export these symbols. I'm just not convinced the maintenance cost for avsc
of doing so is worth it yet.
Adding an optional callback function shouldn't be too onerous for maintenance?
We would need to maintain both synchronous and asynchronous schema creation functions. It would also be a large refactor, whether for callbacks or promises. For example logical type handling currently assumes that only one schema is being parsed at a time.
I agree that it would be convenient for your use-case to export these symbols. I'm just not convinced the maintenance cost for
avsc
of doing so is worth it yet.
Okay, fair enough. I think given that the package provides a typeHook
interface that is likely to encounter namespaced entities it would make sense, but I agree it's not a lot of duplicated code.
We would need to maintain both synchronous and asynchronous schema creation functions. It would also be a large refactor, whether for callbacks or promises. For example logical type handling currently assumes that only one schema is being parsed at a time.
I think it would be worth it.
I am closing this for now as the initial issue has been solved.
I am currently trying to load schemas from an automatically generated set of avro schemas. It is basically a bunch
<typeName>.avsc
files in a directory. They reference each other. So when I load one like this:where
fieldValueType
is'CompanyValue'
:it yields:
because the schema from
CompanyData.avsc
has not yet been loaded.Initially I thought the answer would be to define an import hook: https://github.com/mtth/avsc/blob/7cb76a6eb7b0d5b5a71c9cfb446acf3eff763006/lib/files.js#L14 however it seems this is only supported for: https://github.com/mtth/avsc/blob/7cb76a6eb7b0d5b5a71c9cfb446acf3eff763006/lib/specs.js#L23
I can see that there is a local
registry
map I can pass into eachopt
: https://github.com/mtth/avsc/blob/7cb76a6eb7b0d5b5a71c9cfb446acf3eff763006/lib/types.js#L103 so I could just load all files into the registry with:however I'd need to do that in a specific order which means I'd need to parse the model schemas myself beforehand.
I am wondering if you'd be up for adding an additional
to
opts
which kicks in before https://github.com/mtth/avsc/blob/7cb76a6eb7b0d5b5a71c9cfb446acf3eff763006/lib/types.js#L159 to dynamically load a missing Schema (recursively).