openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

pyensembl CLI installs wrong genome #223

Open iskandr opened 5 years ago

iskandr commented 5 years ago
pyensembl install --reference-name grch37
2019-07-30 09:57:30,683 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=95, species='homo_sapiens')
2019-07-30 09:57:30,683 - pyensembl.download_cache - INFO - Fetching /home/alex/.cache/pyensembl/GRCh38/ensembl95/Homo_sapiens.GRCh38.95.gtf.gz from URL ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz
2019-07-30 09:57:30,683 - datacache.download - INFO - Downloading ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz to /home/alex/.cache/pyensembl/GRCh38/ensembl95/Homo_sapiens.GRCh38.95.gtf.gz
yangyxt commented 2 years ago

I run into the same issue, any solutions?

yangyxt commented 2 years ago

BTW, here is the screenshot: image

yangyxt commented 2 years ago

I checked the code and found the key function to determine the reference name is the class method which_reference from class Species. While the mapping relationship between Ensembl releases and assembly names are stored in a dict object referenced by Species.reference_assemblies, which is determined by input when an instance of species is initialised. According to the comment there: reference_assemblies : dict Mapping of names of reference genomes onto inclusive ranges of Ensembl releases Example: {"GRCh37": (54, 75)}

I wonder where you get the info that GRCh37 is only available in release 54 and 75?

yangyxt commented 2 years ago

Sorry I checked that GRCh37 relevant sequences are only available before ensemble release 75: image

Thanks!