greenelab / django-genes

A Django package to represent genes
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

bitbucket: CrossRefDB names need to match in a case-insensitive manner #9

Open rzelayafavila opened 6 years ago

rzelayafavila commented 6 years ago

Copied from bitbucket issue #9: (priority: minor) https://bitbucket.org/greenelab/django-genes/issues/9/crossrefdb-names-need-to-match-in-a-case

@mhuyck commented: """ Using genes_load_geneinfo on a file for Pseudomonas aeruginosa triggers warnings for each occurrence of a synonym when the genes_add_xrdb command was used with the label --name=PseudoCAP.

Although the capitalization "PseudoCAP" is the official label, the Pseudomonas aeruginosa file downloaded from "ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Archaea_Bacteria/Pseudomonas_aeruginosa_PAO1.gene_info.gz " lists the synonyms as "PseudoCap" so matching the CrossRefDB fails. See attached file for a complete list of the errors I received.

Although changing the label to PseudoCap should allow the file to load, it may be problematic for matching with other data sources, such as GO. The proposed enhancement here is to make django-genes match this field in a case-insensitive manner wherever relevant. """ Files with the error output are attached in the original bitbucket issue.

@mhuyck also commented: "" Capturing this good idea from @cgreene via Slack:

let's make sure that adding multiple databases with different cases causes an error that way, if it does somehow happen that someone starts PSEudocap, then the user has to choose one ;)

This will probably be a natural consequence of making this field case-insensitive everywhere, but I wanted to not miss this goal.

"""

rzelayafavila commented 6 years ago

This issue is marked as enhancement and minor in bitbucket, but I think its importance should be bumped up, especially as it is causing me problems with the annotation refinery and the cross-reference gene identifiers.