identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

Prefix capitalization for namespaces and inconsistent case sensitivity #100

Closed dhimmel closed 4 years ago

dhimmel commented 4 years ago

I am confused about when prefixes are case sensitive. For example, both capitalization of the geo prefix resolve:

But for the following:

Looks like the lowercase version violates the regex (see also https://github.com/identifiers-org/identifiers-org.github.io/issues/99)?

However, when looking up the prefixes, I must user lowercase (otherwise 404):

If we look at the doid prefix metadata, how does one deduce that only an ALL_CAPS prefix will be accepted during resolution?

{
  "prefix" : "doid",
  "mirId" : "MIR:00000233",
  "name" : "Human Disease Ontology",
  "pattern" : "^DOID\\:\\d+$",
  "description" : "The Disease Ontology has been developed as a standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts.",
  "created" : "2019-06-11T14:16:32.719+0000",
  "modified" : "2019-06-11T14:16:32.719+0000",
  "deprecated" : false,
  "deprecationDate" : null,
  "sampleId" : "11337",
  "namespaceEmbeddedInLui" : true,
  "_links" : {
    "self" : {
      "href" : "https://registry.api.identifiers.org/restApi/namespaces/689"
    },
    "namespace" : {
      "href" : "https://registry.api.identifiers.org/restApi/namespaces/689"
    },
    "contactPerson" : {
      "href" : "https://registry.api.identifiers.org/restApi/namespaces/689/contactPerson"
    }
  }
}

Is there a missing field like "prefixCapitalization" that would allow applications to know whether to submit resolution requests with a lowercase or uppercase prefix for a given namespace?

mbdebian commented 4 years ago

Hi, let me recover the link to another issue around this same topic, that will shed light at 'case insensitive' resolution of compact identifiers except for those cases where 'namespaceEmbeddedInLui' is 'true', which means the LUI is being used to hit the resolver, and, therefore, the regex must match. For the given example, the regular expression is

^DOID\\:\\d+$

so the LUI looks like DOID:, but you need to keep into account that its compact identifier version, if it wasn't for that flag, would be 'doid:DOID:87325465'

dhimmel commented 4 years ago

Okay so prefixes are case insensitive when namespaceEmbeddedInLui is false, but case sensitive when namespaceEmbeddedInLui is true?

It would probably simplify things for users if all compact IDs could be resolved with any prefix capitalization.

When namespaceEmbeddedInLui is true, how do you figure out the required casing? It sounds like casing is reverse engineered from the regex. It seems like a tool like exrex in Python could create a matching string from a pattern, but it's not a straightforward process.

It seems like the frontend provides the required casing information:

image

Compare to the vario namespace data from the API:

  "prefix" : "vario",
  "pattern" : "^VariO:\\d+$",
  "sampleId" : "0294",
  "namespaceEmbeddedInLui" : true,

So the website provides a "sample compact identifier" with a resolvable compact ID and the required casing. Also the prefix here shows the required casing.

What's the best way to generate this information from the API? Is the code that identifiers.org uses to determine required casing available? Should the API include the convenience fields from the website?