identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

RegEx for PDB-CCD is wrong #198

Closed bmeldal-eg closed 2 years ago

bmeldal-eg commented 2 years ago

Currently: ^\w{3}$

Expected: ^\w{1,3}$

because IDs can be anything UP TO 3 letters.

Thanks.

cthoyt commented 2 years ago

Do you have an example identifier that is less than 3?

bmeldal-eg commented 2 years ago

Sorry, forgot to add: https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/A or https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/PI

cthoyt commented 2 years ago

@bmeldal-eg thanks! Notice I have updated this pattern in the Bioregistry in https://github.com/biopragmatics/bioregistry/pull/367 and re-deployed the service manually (normally it updates nightly).

The corresponding page for the prefix is https://bioregistry.io/pdb-ccd and you can see these three examples are on the page and can be resolved correctly.

bmeldal-eg commented 2 years ago

Hi Charlie, Thank you. I have a list of missing resources that we would like to have added. Do you prefer one ticket per resource or lump them all together? Birgit

cthoyt commented 2 years ago

you can do them one at a time with the new prefix request form on GitHub which gives a bit of insight into what's required/optional, or if you have everything organized in a spreadsheet or something like that you can make an issue and I'll help you through it

bmeldal-eg commented 2 years ago

Excellent. If you have a template Excel sheet that goes with the prefix request form, please send it to me: birgit.meldal AT eaglegenomics.com and I will populate it.

cthoyt commented 2 years ago

I don't have a template right now but will develop one (issue at https://github.com/biopragmatics/bioregistry/issues/371). In the mean time, you'll need the following columns:

  1. Requested prefix
  2. Name of resource
  3. Homepage of resource
  4. Long-form description of resource
  5. Example local unique identifier from resource. More than one welcome!
  6. Regular expression pattern for local unique identifier
  7. URI format string (for resolving local unique identifiers to web pages)
  8. Contributor name
  9. Contributor ORCID id
  10. Contributor GitHub

Optional fields:

  1. Links to publications
  2. GitHub repository for resource
  3. License for resource
  4. Wikidata entry for corresponding database
  5. Download link for OWL/OBO/OBO Graph JSON
  6. Prefix synonyms
  7. Additional free text comments
  8. Twitter handle for database/resource
  9. FAIRSharing ID
  10. BioPortal Ontology ID

UPDATE: Bulk contribution guidelines are here now, with a full template: https://github.com/biopragmatics/bioregistry/blob/main/docs/CONTRIBUTING.md#bulk-contribution

bmeldal-eg commented 2 years ago

The new prefix is showing in bioregistry but not in identifier.org

Reading the instructions I notice it's still "ec-code" instead of "eccode" in identifiers.org as well. Why do the prefixes not get propagated?

cthoyt commented 2 years ago

Bioregistry imports and aligns content from Identifiers.org on a nightly basis, but there's no reverse sync. All original content in Bioregistry is CC0 licensed and we encourage reuse, but it might be the case that the Identifiers.org project has run out of steam and probably don't have the resources to do this. Would love to hear from anyone involved in it

bmeldal-eg commented 2 years ago

But this is the identifiers.org GitHub repo...

Although, this is strange, there is this request page as well as a link to this GH repo I'm in right now. Hmmm.

I'll try the request page, too.

cthoyt commented 2 years ago

Indeed. Since the identifiers.org team has been unresponsive the last year or so, I’ve been notifying people on this issue tracker that there’s a better alternative where their issues can be more promptly fixed/addressed

hHermjakob commented 2 years ago

Done in identifiers.org.

bmeldal-eg commented 2 years ago

Hi Henning :)

Thank you for changing the RegEx.

I'm curious though, when I entered the prefix in the form it rejected the existing "pdb-ccd" and only accepted "pdb.ccd" as per bioregistry instructions.