identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

Entries that should probably be marked as namespaceEmbeddedInLui=False #154

Open cthoyt opened 3 years ago

cthoyt commented 3 years ago

The following entries are marked with "namespaceEmbeddedInLui": true, but I do not think they should not be. None of their sample identifiers actually contain the prefix, and the CURIE that follows https://identifiers.org/{curie} also is completely normal, following the same pattern as all of the other entries with "namespaceEmbeddedInLui": false.

Additionally, all of the provider's format strings seem to only take in the identifier (though some do seem to also contain hard-coded copies of the prefix), so is it that I am misunderstanding the usage of "namespaceEmbeddedInLui"?

This is of general importance because given a prefix/identifier pair, it is still often unclear how to construct a valid URL for identifiers.org. Typically, examples marked with "namespaceEmbeddedInLui": true come from the OBO world where the solution is to uppercase the prefix to make a valid URL for identifiers.org, but beyond this, there don't seem to be any general rules for how to handle true entries.

cc: @bgyori @dhimmel @cgreene this might be of interest to you too

prefix pattern sample example url
mge ^mge:\d+$ 2 https://identifiers.org/mge:2
mzspec ^mzspec:.+$ PXD002255::ES_XP_Ubi_97H_HCD_349:scan:9617:LAEIYVNSSFYK/2https://identifiers.org/mzspec:PXD002255::ES_XP_Ubi_97H_HCD_349:scan:9617:LAEIYVNSSFYK/2
swh ^swh:[1-9]:(cnt|dir|rel|rev|snp):[0-9a-f]+(;(origin|visit|anchor|path|lines)=\S+)*$1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d https://identifiers.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d
did ^did:[a-z0-9]+:[A-Za-z0-9.\-:]+$ sov:WRfXPg8dantKVubE3HX8pw https://identifiers.org/did:sov:WRfXPg8dantKVubE3HX8pw
ocid ocid:[0-9]{12} 190000021540 https://identifiers.org/ocid:190000021540

This table was generated with the following code:

import requests
from tabulate import tabulate

#: see https://docs.identifiers.org/articles/api.html#getdataset
URL = 'https://registry.api.identifiers.org/resolutionApi/getResolverDataset'

def main():
    res = requests.get(URL).json()
    rows = []
    for entry in res['payload']['namespaces']:
        prefix = entry['prefix']
        pattern = entry['pattern']
        namespace_in_lui = entry['namespaceEmbeddedInLui']
        sample_id = entry['sampleId']
        url = f'https://identifiers.org/{prefix}:{sample_id}'
        if namespace_in_lui and pattern.lstrip('^').startswith(f'{prefix}:'):
            rows.append((prefix, pattern, sample_id, url))

    print(tabulate(rows, headers=['prefix', 'pattern', 'sample', 'example url'], tablefmt='html'))

if __name__ == '__main__':
    main()