phenopackets / phenopacket-tools

An app and library for building, conversion, and validation of GA4GH Phenopackets.
http://phenopackets.org/phenopacket-tools/stable/
GNU General Public License v3.0
10 stars 5 forks source link

VRS schema definitions mismatch #196

Open v-rocheleau opened 9 months ago

v-rocheleau commented 9 months ago

Hi!

While playing with the phenopacket-tools examples, after a while I noticed that the VRS schemas in this repo vary from the official VRS spec, as stated in schema description itself in vrs-variation-adapter.json.

Is there an historical reason for this bifurcation from the specs? If so is it still needed?

If not, I believe a better approach would be to obtain the vrs.json from the VRS repo in order to replace vrs-variation-adapter.json with the official specs, it could be done by using a Git submodule for instance.

ielis commented 9 months ago

Hi @v-rocheleau the reason for the divergence is that for Phenopacket Schema, the protobuf files are the single source of truth. For the VRS part, when you follow the links starting from Phenopacket Schema repo , you land at this VRS proto file.

These VRS proto files are part of Phenopacket Schema v2.0.0 and, consequently, that's what phenopacket-tools are designed to validate.

As far as I know, VRS specs are encoded into JSON schema. However, not JSON schema concepts are translatable to Protocol Buffers language. So, it is unlikely that using Git submodules would solve this issue..

v-rocheleau commented 9 months ago

Thanks for the quick response @ielis

I understand better now, given the protobuf/json-schema concept differences.

The vrs.json json-schema file from the VRS repo is imported in the official vrs-python repo as a submodule, so I was under the impression that this schema file could be used as the VRS JSON-schema source of truth.

Given that phenopacket-tools supports YML, protobuf and JSON-schema formats, do you think it would make sense for it to use the official schema file depending on the format?

Some background on why I am asking this:

ielis commented 9 months ago

Hi @v-rocheleau

Given that phenopacket-tools supports YML, protobuf and JSON-schema formats, do you think it would make sense for it to use the official schema file depending on the format?

I do not think this is the way how Phenopacket Schema is defined. The latest (v2.0.2) version includes specific protobuf file that should cover the VRS elements. However, 1:1 mapping between the protobuf files and VRS JSON schema does not exist. Therefore, a JSON document that contains a sub-tree from the "official" VRS schema, e.g. in VariationDescriptor > Variation field will not validate as v2.0.2 phenopacket even if everything else is OK. So, coming back to your question above, I don't think this is the right thing to do.

Phenopacket tools can convert V1->V2, and the conversion is available both through CLI and the Java API (Javadocs here). However, it only works for the "VRS" items as defined in the protobuf version (not vrs.json, vrs-python, etc.), which may not be what you need.