rpatkennyiii / pygr

Automatically exported from code.google.com/p/pygr
0 stars 0 forks source link

UCSCEnsemblInterface fails with Genome doesn't exist error #134

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. create a UCSCEnsemblInterface object following the steps in the Ensembl 
interface tutorial

What is the expected output? What do you see instead?

Instead of getting an interface object, it raises an exception, 
"Genome hg18 doesn't exist or has got no Ensembl data at UCSC"

This problem recently arose due to changes in UCSC's database.  The 0.8.2 code 
assumed that a given UCSC dataset identifier (e.g. "hg18") would be associated 
with only a single row in the trackVersion table, but recently UCSC added 
multiple such rows.  This caused the pygr 0.8.2 code to fail when attempting to 
look up the right identifier for linking a UCSC dataset to a specific Ensembl 
dataset, reporting an error message like "Genome hg18 doesn't exist or has got 
no Ensembl data at UCSC".  My fix simply tries each of the rows associated with 
the specified UCSC dataset, until it successfully makes a connection to an 
Ensembl dataset.  This fix will not affect anything outside of the Ensembl 
interface, and passes all our Ensembl + other tests.  I pushed this to the 
master branch on github for anyone who needs this prior to a next release of 
Pygr.

Original issue reported on code.google.com by cjlee...@gmail.com on 23 Jun 2011 at 6:33

GoogleCodeExporter commented 8 years ago
This is definitely a useful change and one worth keeping, in case of future 
changes to the UCSC database which could make hitherto-unique rows lose their 
uniqueness. However, IMHO we need a bit more here before this issue can be 
closed - namely, it may make sense to actually choose among the returned rows. 
Two options (which could be combined) spring to mind:
 - choosing the latest Ensembl-dataset version available for the specified UCSC genome,
 - letting the user specify the Ensembl version as well.

Original comment by mare...@gmail.com on 20 Sep 2011 at 8:21

GoogleCodeExporter commented 8 years ago
Is there a workaround for this? I'd like to be able to extract exon sequence 
information from the hg19 build.

Thanks,
Alan

Original comment by derr.a...@gmail.com on 18 Jan 2012 at 8:49

GoogleCodeExporter commented 8 years ago
I got it to work using the new version of ucsc_ensembl_annot.py on the main 
branch. 
Thanks,
Alan

Original comment by derr.a...@gmail.com on 20 Jan 2012 at 4:33