Closed aclum closed 3 months ago
Good idea.
The canonical expansion for nmdc
, as defined in the schema itself, is 'https://w3id.org/nmdc/', which gets redirected to either 'https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/src/schema/' or 'https://microbiomedata.github.io/nmdc-schema/' depending on the request's filename extension, according to https://github.com/perma-id/w3id.org/blob/master/nmdc/.htaccess
Maybe we need to split nmdc
into nmdc-model
and nmdc-data
We can split this up by the structured_pattern.syntax so that study and biosample redirect to data portal pages and other classes direct to an API call?
Can you please give an example?
The example I had today is we want to have links on ESS-DIVE that link to NMDC study identifiers. One study where we have this is NMDC study ID nmdc:sty-11-5tgfr349 (https://data.microbiomedata.org/details/study/nmdc:sty-11-5tgfr349) should be associated with ESS-DIVE identifiers doi:10.15485/1603775 and doi:10.15485/1729719.
nmdc is a registered prefix with bioregistry.io but bioregistry.io/nmdc:sty-11-5tgfr349 resolves to https://drs.microbiomedata.org/objects/sty-11-5tgfr349 which returns { "detail": "Not found" }
So I would like a prefix of nmdc + a structured_pattern.syntax that starts with sty- to expand https://data.microbiomedata.org/details/study/ to resolve the identifier (ie https://data.microbiomedata.org/details/study/nmdc:sty-11-5tgfr349) vs nmdc + a structured_pattern.syntax that starts with bsm to resolve to https://data.microbiomedata.org/details/sample to resolve the identifier (ie https://data.microbiomedata.org/details/sample/nmdc:bsm-11-h8kqjw06)
This is what GOLD does, a single GOLD prefix resolves regardless if it is a Gs/Gp/Gb* So GOLD:Gs0162708 will resolve to https://gold.jgi.doe.gov/study?id=Gs0162708 and GOLD:Gb0381873 will resolve to https://gold.jgi.doe.gov/biosample?id=Gb0381873
I'd like to prioritize this discussion around the fact
nmdc
prefix is used for schema element CURIes, for which w3id resolution is used. From that perspective, the bioregistry expansion for nmdc
is incorrect.nmdc
prefix is also being used in data, and should be resolvable to something that makes sense to you and other NMDC colleagues, like the API endpoints you mentioned. This isn't currently possible through the official w3id resolver, and despite good intentions, I would says that @dwinston's bioregistry resolver is rogue. We can't have two different resolvers, and I don't see how the id
patterns can help with that. In my opinion, we need two different namespaces for nmdc-schema elements and data objects. Maybe @cmungall has a better solution.I'll put this on the metadata meeting agenda.
my initial thoughts:
bioregistry.io/nmdc:$1
to w3id.org/nmdc:$1
.bsm
or sty
) or the runtime API (other typecodes). If the URL suffix is not recognized, the fall-through case is that it's likely the name of a schema element, so pass to https://microbiomedata.github.io/nmdc-schema/$1
.@dwinston can you make the metadata meeting at 1pm Wed or should we move the discussion to the infrastructure sync on Thursday?
@aclum I'll plan to be at this week's metadata meeting.
related: perma-id/w3id.org#3584
I realized that a change to the bioregistry.io
registry would also prompt (and perhaps depend on) a change to the identifiers.org
("miriam") registry, both of which currently point nmdc:$1
to https://drs.microbiomedata.org/objects/$1
.
I decided, for the interim and to get quicker feedback on desired resolution behavior, to update runtime code to redirect to data portal landing pages and schema documentation as appropriate (see GET /objects/{object_id}
documentation).
Thus, for example:
A weakness of this approach currently is that it relies on the data portal to return a 404 not found response if it doesn't have an entry for the study or biosample. This is not currently the case. e.g. https://bioregistry.io/nmdc:sty-NotARealID loads https://data.microbiomedata.org/details/study/nmdc:sty-NotARealID because the latter is a 200 OK response.
@dwinston do you consider this ticket addressed?
Donny had registered the nmdc prefix w/bioregistry.io However based on the pattern this only resolves for objects and does not follow the current identifier scheme so is misleading. https://bioregistry.io/registry/nmdc. This needs to be updated to say what kind of prefixes are valid. We also need to decide if identifiers should resolve to the API or UI.
@cmungall @shreddd @turbomam @dwinston