pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Fix format of wild type genotype IDs #2222

Open jseager7 opened 4 years ago

jseager7 commented 4 years ago

The format of the database identifiers for wild-type genotypes is as follows:

<genus>-<species>-wild-type-genotype<strain>

For example:

Hordeum-vulgare-wild-type-genotypeGolden-Promise

There's a few problems with this format:

  1. The strain name has no hyphen separating it from the text 'wild-type-genotype'.

  2. It might be better to use the NCBI taxonomy ID instead of the scientific name, since the taxon ID is shorter, easier to validate, and (hopefully) less likely to change than the scientific name. It also avoids ambiguity from anamorph / teleomorph names when exporting fungal species names.

And one other thing that might be a non-issue:

  1. It's not clear how the exporter handles strain names that contain Unicode characters. The JSON specification requires JSON to be in Unicode, so if the JSON exporter is standards-compliant, then we shouldn't have anything to worry about – it shouldn't complicate validation either.
jseager7 commented 4 years ago

I put the schema changes label on this since I assumed that these identifiers are used internally in the database; if that's not the case, then the label can be removed.