kit-data-manager / metastore2

General purpose metadata repository and schema registry service.
Apache License 2.0
6 stars 10 forks source link

Schema Grouping #18

Open tegx opened 3 years ago

tegx commented 3 years ago

It would be a great enhancement if it is possible to create groups/collections of schemas. This would avoid name conflicts between different schemas from different standards and allows an user to directly associate a schema with a standard by looking at the schema url.

Example: http://example.org/api/v1/schemas/{collection}/{id}

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id":  "http://example.org/api/v1/schemas/iec61850/DPVM",
  "title": "DPVM",
  "...": "..."
}
ThomasJejkal commented 3 years ago

Technically, this would be not a problem but I'm not sure about the benefits of such an additional model layer. Duplicated identifiers for schemas are avoided on protocol level. This would be also required for collections. Also the direct association based on collection and schema id is not guaranteed as both identifiers are user provided and there are no semantic requirements.

If it's about grouping I could imagine an additional attribute in the schema record representing some kind of tag. This could also be included into queries easily.

VolkerHartmann commented 3 years ago

I see 2 problems in it:

  1. all schemas without collection should get a dummy collection, which is not useful
  2. what to do if 2 hierarchy levels are not sufficient. Then you would have to introduce an additional dummy.

I would prefer the solution with the tags. You are much more flexibel and it may be possible to support more than one tag per schema: http://localhost:8040/api/v1/schemas?tag=.... for filtering the schemas. :warning: the schemaID must still be unique!

For accessing it would be possible to encode the schemaID on client side. (I have to document this) e.g.: http://example.org/api/v1/schemas/iec61850/DPVM --> http%3A%2F%2Fexample.org%2Fapi%2Fv1%2Fschemas%2Fiec61850%2FDPVM

I have to admit that this does not look nice but would be easy for a client to handle.

ThomasJejkal commented 3 years ago

... Or to jump on the hashcode train from the other issue, the identifier could also be the hash of the schema URL which is enforced on client side and which is unique. This is of course not human readable, but this should be not in focus as the user should not have work with the interface directly.

tegx commented 3 years ago

My thought with groups/collections was that if someone is working with several standards describing the same domain, it is very likely that there will be two schemas (not necessarily the same) describing the same concept and therefore have the same name. (Grouping by a tag is a good idea, but would not solve this problem.)

If a collection exists, schemas that are uploaded using the collection ID are added to the collection, if the collection does not exits it is created. So the duplicate ID detection only needs to verify that the schema ID is unique within a collection. Yeah maybe, the idea to read the membership of a schema by its URL is not a good idea, since a user shouldn't see such things at all =).

I think the problem with a dummy collection for all schemas not belonging to a collection is an implementation problem and should not effect the user (it must be avoided to reflect this problem to the URL!). By defining that collections cannot contain collections no real hierarchy concept is introduced, also I cannot imagine why someone would need more than 2 layers, but yes....someone may will ask for this in the future :D

ThomasJejkal commented 3 years ago

This might be to some extend related to #29