identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

sample IDs vs. patterns are inconsistent #200

Closed Schmoho closed 1 year ago

Schmoho commented 2 years ago

For example nextprot and vario.

pattern: ^VariO:\d+$, prefix: VariO, sample ID: 0294

pattern: ^NX_\w+, prefix: nextprot, sample ID: NX_O00165

That is, sometimes the prefix is included in the pattern, sometimes its not. I think I even remember cases where there was a "second prefix" (like NX_ in this case), which was different from the prefix, included in the pattern, but not in the sample ID.

Anyways, even the case I illustrated is very unfortunate. Inconsistencies like these require a lot of hand-holding when using the registry for automated, namespace-agnostic handling. If there is a way to contribute to curation here I'd be happy to.

cthoyt commented 2 years ago

hi @Schmoho, see also https://github.com/identifiers-org/identifiers-org.github.io/issues/99 and https://github.com/identifiers-org/identifiers-org.github.io/issues/65. Identifiers.org has been struggling to respond to requests to solve this and many other related problems.

In the mean time, we're actively working on the Bioregistry (https://bioregistry.io, https://github.com/biopragmatics/bioregistry) which solves this issue in a much more principled way (though the cases with the double prefix using an underscore delimiter like in nextprot and cellosaurus hasn't been solved... would love some suggestions on how to handle these)

renatocjn commented 1 year ago

Hello @Schmoho,

We are sorry that this makes it difficult for you to use identifiers.org. The system was made so that it would allow the registration of any kind of namespace which made ID syntax and management quite tricky. As Charles mentioned, the namespaceEmbeddedInLui field of the namespace entries (only visualized via the rest API at the moment) will help you in that regard.

If you want to be more specific on how the code works, I invite you to check the code for yourself as it is freely available in the resolver repository. More specifically, the classes CompactIdentifierResolutionService and the CompactIdParsingHelper.

Feel free to contact us in case of any doubt.