phenopackets / phenopacket-format

26 stars 10 forks source link

investigate possibility for a global IRI scheme for uniquely identifying a variant #23

Open cmungall opened 8 years ago

cmungall commented 8 years ago

There are schemes such as HGVS that unambiguously map a genomic variation to a string. It is suggested we use this in #10. However, in relation to #22, if we want to reference a variant we need to do it via an identifier, and not a string, and we have constraints on the syntax of identifiers to ensure uniqueness, persistence, resolvability.

In many cases we can use a pre-existing database identifier (e.g. if the variant is in clinvar), but for some cases there will be no public ID and we will need to reference via an identifier scheme.

As all identifiers in this format are URIs (although they are typically shortened to CURIEs), there are a few possibilities:

There has always been a debate about coupling or decoupling of identity to resolvability, going back before LSIDs. The format can remain neutral here, but we could potentially push forward in this direction.

Of course, we need to ensure whatever the technical scheme that the coupled standard (e.g. HGVS) is sufficient and can do things like uniquely reference any build of any chromosome/scaffold in any species. This may require extensions, more research required.

pnrobinson commented 8 years ago

Note that the HGVS schema often involves an arbitrary choice of which transcript is the most relevant (a single variant can affect multiple transcripts, often up to 20 or more, of the same gene and in some cases even of different genes). Using a VCF-based notation with respect to a defined genome build would be another option, but not one that is easily familiar to many clinicians. Nonetheless, the latter is the strategy taken by ClinVar (they show one or more HGVS based representations as well). What is really needed is a software at some database such as ClinVar or Monarch that would immediately provide such an ID for variants and allow linkage to a phenopacket. This may be something we need to postpone until there is a version 2.