phenopackets / phenopacket-format

26 stars 10 forks source link

Revise documentation on local identifiers in wiki #58

Open cmungall opened 8 years ago

cmungall commented 8 years ago

https://github.com/phenopackets/phenopacket-format/wiki/Identifiers

cc @balhoff

hashes are problematic as they need quoted in yaml

jmcmurry commented 8 years ago

If it is useful, according to miriam db, the only regexes containing hash or the possibility of hash are hgnc.family and omim. What if we were to recommend that the identifier string get quoted in any case? Too onerous? In the case of OMIM, is the hash an alternate way to represent? Or is the hash required for identification?

balhoff commented 8 years ago

I was thinking that it would be good to align with RDF convention (e.g. Turtle); the hash identifier would be appended to whatever is defined for the base (e.g. the URI of the file). But the current docs suggest a special case where the hash is dropped out of the generated URI.

But I misled @cmungall in our discussion about this earlier; I thought relative URIs could be used in Turtle as "any ID without a colon". But actually any ID without a colon (anything that is not an abbreviated URI) must be wrapped in angle brackets. If it's a relative URI then it is appended to whatever base is defined.

So... I guess actually the un-bracketed hash ID doesn't already have an RDF meaning.

But since the other global IDs are all CURIEs, I was thinking that the empty prefix could be a good convention for local identifiers since it wouldn't require quoting. Instead of "#1", you could do :1. And the JSON-LD context could even have the empty prefix defined to mean <#> or </> so that all the colon-prefixed IDs would become relative URIs. Just some thoughts to avoid problems with people forgetting to put quotes around those IDs.