saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[BUG] Example download doesn't work #166

Closed MrOlm closed 1 year ago

MrOlm commented 1 year ago

Running the colab notebook (https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb#scrollTo=YQLxy1yzH6dQ)

fails on the step

!pysradb srx-to-srr SRX4720625 --detailed | pysradb download

with the error



Using recommended_url instead.

Traceback (most recent call last):
  File "/usr/local/bin/pysradb", line 8, in <module>
    sys.exit(parse_args())
  File "/usr/local/lib/python3.7/dist-packages/pysradb/cli.py", line 1219, in parse_args
    args.threads,
  File "/usr/local/lib/python3.7/dist-packages/pysradb/cli.py", line 121, in download
    threads=threads,
  File "/usr/local/lib/python3.7/dist-packages/pysradb/sradb.py", line 1523, in download
    + ".sra"
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 5487, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/accessor.py", line 181, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/strings/accessor.py", line 168, in __init__
    self._inferred_dtype = self._validate(data)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/strings/accessor.py", line 225, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!```
saketkc commented 1 year ago

Thanks, I can confirm that it is currently broken. For now I could recommend dumping the metadata in a tsv and then using the sra_url field to download:

$ pysradb srx-to-srr SRX4720625 --detailed  --saveto x.tsv

experiment_accession    run_accession   study_accession study_title experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_total_spots run_total_bases run_alias   sra_url_alt sra_url AWS_url AWS_free_egress AWS_access_type experiment_alias    source_name tissue  developmental stage gfp status  genetic background  ena_fastq_http  ena_fastq_http_1    ena_fastq_http_2    ena_fastq_ftp   ena_fastq_ftp_1 ena_fastq_ftp_2
SRX4720625 SRR7882015 SRP162234 Transcriptomic profile of zebrafish cardiomyocytes throughout heart development GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq GSM3396533: wt_GFPpos_24hpf_rep1; Danio rerio; RNA-Seq 7955 Danio rerio <NA> RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS3805811 <NA> NextSeq 500 NextSeq 500 ILLUMINA 47867961 3470385670 47867961 7230485009 GSM3396533_r1 s3://sra-pub-src-3/SRR7882015/RNA_cardio_pos_24hpf_rep_1_R2.fq.gz https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7882015/SRR7882015 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7882015/SRR7882015 worldwide anonymous GSM3396533 FACS-sorted embryo cells FACS-sorted embryo cells 24 hpf GFP positive wild type <NA> http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz <NA> [era-fasp@fasp.sra.ebi.ac.uk](mailto:era-fasp@fasp.sra.ebi.ac.uk):vol1/fastq/SRR788/005/SRR7882015/SRR7882015_1.fastq.gz [era-fasp@fasp.sra.ebi.ac.uk](mailto:era-fasp@fasp.sra.ebi.ac.uk):vol1/fastq/SRR788/005/SRR7882015/SRR7882015_2.fastq.gz
saketkc commented 1 year ago

This is now fixed in pysradb v2.1.0