monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

InterPro curie for GO #959

Closed TomConlin closed 3 years ago

TomConlin commented 3 years ago

fix for: https://ci.monarchinitiative.org/blue/organizations/jenkins/dipper-pipeline/detail/dipper-pipeline/152/pipeline/79

cmungall commented 3 years ago

I think we need some general principles for adding new CURIE prefixes, otherwise we are setting ourselves up for clashes

In general I recommend against using a URL that resolves to HTML and is intended for human. These tend to be unstable. And they also tend to be different than what others use in RDF, which makes federated queries/joins/mashups etc harder.

UniProt uses these PURLs:

http://purl.uniprot.org/interpro/IPR000581

If you look in prefix commons, both GO and idot use:

http://identifiers.org/interpro/IPR000581

I would use one of these over a web URL

But let's also sync with biolink, if we are going to represent interpro URIs in monarch, we want these to be part our datamodel, can you have a go at adding protein domains to the schema? We have 'gene family' already this may be a sibling to that

On Fri, Jul 24, 2020 at 1:20 AM Tom Conlin notifications@github.com wrote:


You can view, comment on, or merge this pull request online at:

https://github.com/monarch-initiative/dipper/pull/959 Commit Summary

  • InterPro for GO
  • generated

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/dipper/pull/959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOLPV6J46HHINPE6PGDR5E75BANCNFSM4PGPVTWA .

kshefchek commented 3 years ago

a simple rule would be anything that resolves to the EBI domain should use identifiers.org, since EBI/Elixir runs id.org there is no risk of attribution issues

cmungall commented 3 years ago

Maybe, but there is also an argument that purl.uniprot are the semantic web experts in the protein space... but either id.org or purl.up are better than web urls

On Mon, Jul 27, 2020 at 8:52 AM Kent Shefchek notifications@github.com wrote:

a simple rule would be anything that resolves to the EBI domain should use identifiers.org, since EBI/Elixir runs id.org there is no risk of attribution issues

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/dipper/pull/959#issuecomment-664479403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONNAHJLPTYFPGAIDDDR5WPFRANCNFSM4PGPVTWA .

TomConlin commented 3 years ago

There are upstream (external) and downstream (internal) considerations which serve different purposes. Dipper RDF output at rest can not serve both simultaneously and as we will not be stopping the general public from fetching the data is most appropriate for the data at rest to be in its most general, public friendly form.

Downstream/internally, Monarch/Translator (& other competent entities) can and should use whatever name space formats makes the most sense for the identifier fragments in their more refined contexts.

If rewriting the well defined (in curie_map.yaml) URI is too great a burden on Monarch/Translator processes I can (with @mellybelly 's leave) work on making alternative renderings privately available; given appropriate curie-prefix to base-uri mapping (yaml) file.

I do not recommend this approach as if these downstream products are going to interface with the wider world (highly recommended) they had best be prepared to do this mapping on their own in any case.

cmungall commented 3 years ago

I'm not sure I fully understand, but we want to be using the same prefixes and URIs across projects, anything that serves this aim helps, and reduces overall churn

On Mon, Jul 27, 2020 at 3:08 PM Tom Conlin notifications@github.com wrote:

There are upstream (external) and downstream (internal) considerations which serve different purposes. Dipper RDF output at rest can not serve both simultaneously and as we will not be stopping the general public from fetching the data is most appropriate for the data at rest to be in its most general, public friendly form.

Downstream/internally, Monarch/Translator (& other competent entities) can and should use whatever name space formats makes the most sense for the identifier fragments in their more refined contexts.

If rewriting the well defined (in cure_map.yaml) URI is too great a burden on Monarch/Translator processes I can (with @mellybelly https://github.com/mellybelly 's leave) work on making alternative renderings privately available; given appropriate curie-prefix to base-uri mapping (yaml) file.

I do not recommend this approach as if these downstream products are going to interface with the wider world (highly recommended) they had best be prepared to do this mapping on their own in any case.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/dipper/pull/959#issuecomment-664662932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKMPOXYD3HQ7IVXG33R5X3EHANCNFSM4PGPVTWA .