metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

JSON-Schema-Validator Modul compatible at least with JSON-Schema v7 #443

Closed TobiasNx closed 1 year ago

TobiasNx commented 2 years ago

In context of OERSI we have a JSON-Schema-Validator but it only supports v4. Will be needed in lobid and in RPB too.

Option/features should be:

fsteeg commented 2 years ago

Implementation should probably start in OERSI, following up on https://gitlab.com/oersi/oersi-etl/-/issues/96#note_679435293 and then moving the working implementation here.

fsteeg commented 1 year ago

Implemented in #468, assigning @TobiasNx for functional review.

Usage e.g. validate-json("/schemas/schema.json", schemaRoot="/schemas/")

(The schemaRoot is for resolving sub-schemas as we have in our setup in OERSI.)

TobiasNx commented 1 year ago

I'm trying to test this: https://github.com/TobiasNx/notWorkingFlux/blob/a59fa2193ca5045a8f5f50b1f47387e4d8ffcb39/jsonValidator

Seems not to work yet, what did I do wrong?

fsteeg commented 1 year ago

Ah, right, passing a URL as the main schema is not supported yet. When we talked about URLs in this context, I was thinking of sub-schemas referenced in the main schema, where URLs work (this is handled by the validator library).

Supporting URLs for the main schema does make sense and from your test, seems to be the obvious use case. It's no problem to support in principle, but brings in some additional aspects to consider (like caching to avoid an HTTP request for every document, testing without introducing remote requests in the test). So I'm moving this to backlog for now.

BTW this is a good example for the issue we recently discussed in a meeting, that moving a custom Metafacture module into core is much more than just moving it, and therefore should only happen if there is actual demand for the generic module (which is the case here, but maybe not for #469).

fsteeg commented 1 year ago

Supporting URLs for the main schema [...] brings in some additional aspects to consider (like caching to avoid an HTTP request for every document, testing without introducing remote requests in the test).

I've added support for loading the main schema and relative refs from URLs and mocks for serving the test schemas. Instead of adding explicit caching, I've moved the loading of the schema into the constructor (was possible after removal of schemaRoot). This should avoid reloading the schema on each validation.

@TobiasNx Your validation test should now work.

TobiasNx commented 1 year ago

I tested it. Seems to run smoothly now! Great :)