openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

Errors with the latest `pyensembl` #78

Closed tavinathanson closed 9 years ago

tavinathanson commented 9 years ago
~/drive/work/repos/cancer/nejm $ pyensembl install --release 79
INFO:root:Fetching Homo_sapiens.GRCh38.79.gtf from URL ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/Homo_sapiens.GRCh38.79.gtf.gz
Downloading ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/Homo_sapiens.GRCh38.79.gtf.gz to /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.79.gtf
INFO:root:Decompressing gzip into Homo_sapiens.GRCh38.79.gtf...
INFO:root:Fetching Homo_sapiens.GRCh38.cdna.all.79.fa from URL ftp://ftp.ensembl.org/pub/release-79/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
Downloading ftp://ftp.ensembl.org/pub/release-79/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz to /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.cdna.all.79.fa
INFO:root:Decompressing gzip into Homo_sapiens.GRCh38.cdna.all.79.fa...
INFO:root:Fetching Homo_sapiens.GRCh38.pep.all.79.fa from URL ftp://ftp.ensembl.org/pub/release-79/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz
Downloading ftp://ftp.ensembl.org/pub/release-79/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz to /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.pep.all.79.fa
INFO:root:Decompressing gzip into Homo_sapiens.GRCh38.pep.all.79.fa...
Creating database: /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.79.db
INFO:root:Reading GTF /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.79.gtf into DataFrame
/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pandas/io/parsers.py:1159: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)
INFO:root:Extracting attributes for 2720530 entries in GTF DataFrame
Traceback (most recent call last):
  File "/Users/tavi/.virtualenvs/nejm/bin/pyensembl", line 9, in <module>
    load_entry_point('pyensembl==0.6.1', 'console_scripts', 'pyensembl')()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/shell.py", line 53, in run
    ensembl.install()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/ensembl_release.py", line 194, in install
    self.index(force=False)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/ensembl_release.py", line 207, in index
    self.db.create(force=force)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/database.py", line 540, in create
    self._create_database(force=force)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/database.py", line 153, in _create_database
    df = self.gtf.dataframe()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf.py", line 237, in dataframe
    self._dataframes[key] = cached_dataframe(csv_path, local_loader_fn)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/compute_cache.py", line 92, in cached_dataframe
    df = compute_fn()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf.py", line 211, in local_loader_fn
    full_df = self._load_full_dataframe()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf.py", line 174, in _load_full_dataframe
    return cached_dataframe(csv_path, self._load_full_dataframe_from_gtf)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/compute_cache.py", line 92, in cached_dataframe
    df = compute_fn()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf.py", line 181, in _load_full_dataframe_from_gtf
    return load_gtf_as_dataframe(path)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf_parsing.py", line 282, in load_gtf_as_dataframe
    df = _extend_with_attributes(df)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/gtf_parsing.py", line 154, in _extend_with_attributes
    for k,v in pairs:
ValueError: too many values to unpack
~/drive/work/repos/cancer/nejm $ pyensembl install --release 78
INFO:root:Fetching Homo_sapiens.GRCh38.cdna.all.78.fa from URL ftp://ftp.ensembl.org/pub/release-78/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
Downloading ftp://ftp.ensembl.org/pub/release-78/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz to /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.cdna.all.78.fa
INFO:root:Decompressing gzip into Homo_sapiens.GRCh38.cdna.all.78.fa...
INFO:root:Fetching Homo_sapiens.GRCh38.pep.all.78.fa from URL ftp://ftp.ensembl.org/pub/release-78/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz
Downloading ftp://ftp.ensembl.org/pub/release-78/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz to /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.pep.all.78.fa
INFO:root:Decompressing gzip into Homo_sapiens.GRCh38.pep.all.78.fa...
INFO:root:Cached file Homo_sapiens.GRCh38.78.gtf from URL ftp://ftp.ensembl.org/pub/release-78/gtf/homo_sapiens/Homo_sapiens.GRCh38.78.gtf.gz
Creating database: /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.78.db
Reading Dataframe from /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.78.gtf.expanded.csv
/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pandas/io/parsers.py:1159: DtypeWarning: Columns (0,22) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)
INFO:root:Dropping tables from database /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.78.db: ensembl, _datacache_metadata
INFO:root:Running sqlite query: "DROP TABLE ensembl"
INFO:root:Running sqlite query: "DROP TABLE _datacache_metadata"
WARNING:root:Failed to create tables [nan, 'start_codon', 'Selenocysteine', 'UTR', 'exon', 'stop_codon', 'CDS', 'gene', 'transcript'] in database /Users/tavi/Library/Caches/ensembl/Homo_sapiens.GRCh38.78.db
Traceback (most recent call last):
  File "/Users/tavi/.virtualenvs/nejm/bin/pyensembl", line 9, in <module>
    load_entry_point('pyensembl==0.6.1', 'console_scripts', 'pyensembl')()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/shell.py", line 53, in run
    ensembl.install()
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/ensembl_release.py", line 194, in install
    self.index(force=False)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/ensembl_release.py", line 207, in index
    self.db.create(force=force)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/database.py", line 540, in create
    self._create_database(force=force)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/pyensembl/database.py", line 186, in _create_database
    version=DATABASE_SCHEMA_VERSION)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/datacache/database_helpers.py", line 200, in db_from_dataframes
    version=version)
  File "/Users/tavi/.virtualenvs/nejm/lib/python2.7/site-packages/datacache/database_helpers.py", line 104, in _create_cached_db
    ", ".join(table_names))
TypeError: sequence item 0: expected string, float found
tavinathanson commented 9 years ago

Re-creating the DBs appears to have fixed the TypeError: sequence item 0: expected string, float found, but not the too many values to unpack issue. So I can install 78 but not 79?

iskandr commented 9 years ago

Sorry, it was very naive of me to think that adding a release was an innocuous change. Fixed by https://github.com/hammerlab/pyensembl/pull/81