microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

bioregistery nmdc prefix resolvers need to be updated #423

Closed aclum closed 3 months ago

aclum commented 10 months ago

Donny had registered the nmdc prefix w/bioregistry.io However based on the pattern this only resolves for objects and does not follow the current identifier scheme so is misleading. https://bioregistry.io/registry/nmdc. This needs to be updated to say what kind of prefixes are valid. We also need to decide if identifiers should resolve to the API or UI.

@cmungall @shreddd @turbomam @dwinston

turbomam commented 10 months ago

Good idea.

The canonical expansion for nmdc, as defined in the schema itself, is 'https://w3id.org/nmdc/', which gets redirected to either 'https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/src/schema/' or 'https://microbiomedata.github.io/nmdc-schema/' depending on the request's filename extension, according to https://github.com/perma-id/w3id.org/blob/master/nmdc/.htaccess

Maybe we need to split nmdc into nmdc-model and nmdc-data

aclum commented 10 months ago

We can split this up by the structured_pattern.syntax so that study and biosample redirect to data portal pages and other classes direct to an API call?

turbomam commented 10 months ago

Can you please give an example?

aclum commented 10 months ago

The example I had today is we want to have links on ESS-DIVE that link to NMDC study identifiers. One study where we have this is NMDC study ID nmdc:sty-11-5tgfr349 (https://data.microbiomedata.org/details/study/nmdc:sty-11-5tgfr349) should be associated with ESS-DIVE identifiers doi:10.15485/1603775 and doi:10.15485/1729719.

nmdc is a registered prefix with bioregistry.io but bioregistry.io/nmdc:sty-11-5tgfr349 resolves to https://drs.microbiomedata.org/objects/sty-11-5tgfr349 which returns { "detail": "Not found" }

So I would like a prefix of nmdc + a structured_pattern.syntax that starts with sty- to expand https://data.microbiomedata.org/details/study/ to resolve the identifier (ie https://data.microbiomedata.org/details/study/nmdc:sty-11-5tgfr349) vs nmdc + a structured_pattern.syntax that starts with bsm to resolve to https://data.microbiomedata.org/details/sample to resolve the identifier (ie https://data.microbiomedata.org/details/sample/nmdc:bsm-11-h8kqjw06)

This is what GOLD does, a single GOLD prefix resolves regardless if it is a Gs/Gp/Gb* So GOLD:Gs0162708 will resolve to https://gold.jgi.doe.gov/study?id=Gs0162708 and GOLD:Gb0381873 will resolve to https://gold.jgi.doe.gov/biosample?id=Gb0381873

turbomam commented 10 months ago

I'd like to prioritize this discussion around the fact

aclum commented 10 months ago

I'll put this on the metadata meeting agenda.

dwinston commented 10 months ago

my initial thoughts:

aclum commented 10 months ago

@dwinston can you make the metadata meeting at 1pm Wed or should we move the discussion to the infrastructure sync on Thursday?

dwinston commented 10 months ago

@aclum I'll plan to be at this week's metadata meeting.

mslarae13 commented 10 months ago

https://github.com/microbiomedata/nmdc-runtime/issues/307

dwinston commented 10 months ago

related: perma-id/w3id.org#3584

dwinston commented 10 months ago

I realized that a change to the bioregistry.io registry would also prompt (and perhaps depend on) a change to the identifiers.org ("miriam") registry, both of which currently point nmdc:$1 to https://drs.microbiomedata.org/objects/$1.

I decided, for the interim and to get quicker feedback on desired resolution behavior, to update runtime code to redirect to data portal landing pages and schema documentation as appropriate (see GET /objects/{object_id} documentation).

Thus, for example:

  1. https://bioregistry.io/nmdc:sty-11-5tgfr349 (and https://identifiers.org/nmdc:sty-11-5tgfr349) resolves to a data portal study landing page
  2. https://bioregistry.io/nmdc:bsm-11-002vgm56 (and https://identifiers.org/nmdc:bsm-11-002vgm56) resolves to a data portal biosample landing page
  3. https://bioregistry.io/nmdc:dobj-11-000a9q67 resolves to a runtime API json payload for the object (https://api.microbiomedata.org/nmdcschema/ids/nmdc:dobj-11-000a9q67)
  4. https://bioregistry.io/nmdc:Study resolves to the NMDC schema documentation landing page (explicitly via `https://w3id.org/nmdc/Study

A weakness of this approach currently is that it relies on the data portal to return a 404 not found response if it doesn't have an entry for the study or biosample. This is not currently the case. e.g. https://bioregistry.io/nmdc:sty-NotARealID loads https://data.microbiomedata.org/details/study/nmdc:sty-NotARealID because the latter is a 200 OK response.

aclum commented 3 months ago

@dwinston do you consider this ticket addressed?