phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
79 stars 30 forks source link

JSON schema and entity resolution #267

Open julesjacobsen opened 3 years ago

julesjacobsen commented 3 years ago

Protobuf isn't compatible with JSON schema-based projects such as Search/Beacon so they rely on a second-hand Schemablocks representation of some elements.

Ideally pehnopackets should be directly discoverable and useable in JSON schema projects with its own namespace e.g. schema.org findability.

@jfuerth, can you expand on this?

mbaudis commented 3 years ago

@julesjacobsen Thanks for the summary, which captures the core of it!

There is an additional component - even if Phenopackets would provide the JSON Schema instances for individual schemas like OntologyClass, in the larger context of GA4GH it would be an advantage to have it in a general {S}[B] collection, together with other schemas.

So for me, ideally, it doesn't matter if you define Phenopackets in Protobuf; but it would be very nice to push individual schemas to {S}[B] for "recycling" (i.e. use in other standards).

jfuerth commented 3 years ago

Yes, for Search in particular, we need a way to say things like the following:

The mechanism in Search for doing this is to point to a JSON Schema that is the canonical definition of that concept.

To accomplish this, we have been treating https://schemablocks.org/schemas/sb-phenopackets/current/Phenopacket.json as the canonical identifier for "this is a Phenopacket."

Examples:

I think the shortcomings in expressiveness that I called out above are something we will need to figure out in the context of Search. I'm just including them for clarity and completeness.

What we would be looking for from Phenopackets is a way to refer to a Phenopacket and parts thereof unambiguously. Any two sites that expose Phenopackets data via Search should point to this same place. Designating SchemaBlocks as the official home for such concept pointers would certainly be one way to achieve this.

mbaudis commented 3 years ago

And the {S}[B] schemas point clearly to the donor schema, authors, documentation ... as the authoritative version.

But ideally we'd have a setup in which GA4GH devs support the translation, so that it is not solely left to interaction between {S}[B] "volunteers" and donor schema maintainers.

For the the donor schemas, there are benefits in exposure but also in shifting from "we have this but still work on the docs" to "current stable version in {S}[B], thank you very much indeed".

julesjacobsen commented 3 years ago

Might be of interest, leaving here as a 'bookmark' - this is the blog post

https://github.com/confluentinc/schema-registry

and this one

https://github.com/chrusty/protoc-gen-jsonschema

Relequestual commented 3 years ago

I've had eyes on the confluent schema registry product for a while. I'd love to hear thoughts if anyone does a deep dive!

julesjacobsen commented 3 years ago

https://linkml.github.io/