monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

delimiters within curi prefix #353

Open TomConlin opened 8 years ago

TomConlin commented 8 years ago

Recommend we do not keep them.
currently have:

# these use hyphens 
'IMPRESS-procedure' 
'IMPRESS-protocol' 
'IMPRESS-parameter'
'KEGG-hsa'
'KEGG-path'
'KEGG-ko'
'OMIA-breed'
'MPD-strain'
'MPD-assay'
'ISBN-10'
'ISBN-13'
'ISBN-15'
'KEGG-ds'
# Or use an underscore which tends to be a safer (less overloaded) delimiter
'GO_REF'

# in other cases we change case to signify specialization instead of a delimiter
'FBcv' 
'FBbt'
'FBdv'

the last set is the style I suggest we adopt,

jmcmurry commented 8 years ago

Underscores cause some problems for some curie parsers that expect the underscore to delimit the prefix and local part for interconversion, (eg. EFO:0123456, EFO_123456). What is the particular aversion to dash?

TomConlin commented 8 years ago

In this case needing to convert it to underscore to render as a node label here
http://data.monarchinitiative.org/dot/

overall not having delimiters within the prefix seems preferable

jmcmurry commented 8 years ago

No delimiter, (or absence of delimiter) works well universally well for all sources/use cases. What about dot delimited here? My only objection to dot is the possibility of machines (even people) thinking that certain ones are high-level domain extensions. However, they're useful for readability. The situations in which it is important to roll up collections to a single provider (say NCBI) is less of a driving use case for us. I'm not sure how our choice would impact our google pagerank. Words in URL are among the most important considerations.

KEGG-hsa: etc can now be collapsed to just KEGG: because they did implement a generic resolution endpoint. We have plenty of other context for the identifier classification.

The rest of these look ok to delete the delimiter if you really feel strongly about it. Just A) be on the lookout for junctional nonsense. B) Please make sure that changes are made throughout the stack. We have had some bonkers inconsistencies with curie expansion (see tickets) and I'm not sure where they are originating.