openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

added exception is_valid_id for yeast #210

Closed rraadd88 closed 6 years ago

rraadd88 commented 6 years ago

Related to issue #209: Add an exception for Saccharomyces cerevisiae in is_valid_ensembl_id(ensembl_id) function

As you would be able to notice, I have added an or argument to allow an exception to yeast ids (or ensembl_id.startswith("Y")) I would like to acknowledge that this quick fix may cause problems if users provide gene names starting with Y (they would not get an error).

Ideally, the species name (taxonomic one) should be provided to the function. Here, I went for a quick fix because otherwise I would have to crawl back to all the places where is_valid_{}_id function is used and fix them systematically.

I hope if the quick fix is ok.
If not let me know if I should make an effort to fix this issue systematically.

iskandr commented 6 years ago

Hi @rraadd88 -- this is one of a few different examples that are making me realize that Ensembl gene naming is much less consistent than I previously realized. I think that I'll actually just remove the ID validation, since PyEnsembl already has to codify a ton of fluidly evolving standards about Ensembl's data.

iskandr commented 6 years ago

Closing since I got rid of Ensembl ID validation: https://github.com/openvax/pyensembl/pull/211