identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

Mismatch between LUI pattern and URL pattern for MGI #243

Closed jgtate closed 3 months ago

jgtate commented 3 months ago

There seems to be a mismatch between the LUI pattern for MGI (^MGI:\d+$) and the URL patterns (e.g. http://www.informatics.jax.org/accession/MGI:{$id}) for the Mouse Genome Database resources.

The LUI definition includes the prefix in the regex, and "Prefix embedded in LUI" is set to "yes", suggesting that IDs should contain the prefix, e.g. MGI:1858201. We're therefore storing the ID with the prefix attached but when we insert that into the URL pattern it causes a duplication of the prefix: http://www.informatics.jax.org/accession/MGI:MGI:1858201.

I think the URL patterns for the three MGD resources need to be updated to remove the prefix.

renatocjn commented 3 months ago

Unfortunately, the "Prefix embedded in LUI" flag is quite confusing. Which is why we try not to use it when possible. It is mostly only used for validation. Internally, the rewritter still splits the id and the prefix.

Is the prefix duplication happening to you? It doesn't seem to be the case for these two examples:

jgtate commented 3 months ago

Ah. We're not trying to use the API to resolve the identifier like that. Instead, the ID that we store in our database has the prefix attached, e.g. MGI:2442292, like in your example, because that seems to be what's suggested by the entry in identifiers.org. In our code we retrieve the URL pattern from identifiers.org, i.e. http://www.informatics.jax.org/accession/MGI:{$id}, and insert the ID, which means the final URL includes the duplicate prefix.

renatocjn commented 3 months ago

I see the issue. Unfortunately, it is how it is implemented and we are not able to put the time to work on this now. You will have to handle this on your side. I can leave this thread open to see how many would be interested in changing this behaviour. I remember this was brought up before.

jgtate commented 3 months ago

I understand. I'm not sure we need a change in behaviour though, just an update to the URL patterns for the MGD resource. For example, changing http://www.informatics.jax.org/accession/MGI:{$id} to http://www.informatics.jax.org/accession/{$id} would resolve our issue. Is that a possibility?

renatocjn commented 3 months ago

Then the resolved URL would be invalid. For example, http://identifiers.org/MGI:1858201 would resolve to http://www.informatics.jax.org/accession/1858201.

jgtate commented 3 months ago

OK, so the ID is taken as 2442292, rather than MGI:2442292. Thanks for clarifying. We'll go back to our code and database and resolve this at our end.

I'll mark this issue as closed but just to point out that it still feels like there's an inconsistency for MGI between the values of Local Unique Identifier (LUI) pattern (given in the registry as ^MGI:\d+$) and Sample ID (LUI) (2442292). Shouldn't the MGI entry at least behave like that for HGNC, where the prefix is shown as not embedded in the LUI and is optional in the regex?

Thanks for your help.

renatocjn commented 2 months ago

Just to be a bit more clear, only the numerical ID is taken when rewriting the URL to redirect the user. The prefix embedded in LUI only affects when verifying the request against the LUI pattern. This is likely to correct rewrite URLs for some namespaces. Although, I quickly looked but couldn't find an example for you.

You are right, it's a weird inconsistency with HGNC.I would say I can't have both working in the same way since the prefix is optional for HGNC.

What we should probably do HGNC is remove the prefix from the regex since it accepts URIs like http://identifiers.org/hgnc:hgnc:2674 but I don't know if the intention was to allow things like http://identifiers.org/hgnc:HGNC:2674; which is weird in of itself.