jbloomlab / dms_tools2

software for the analysis and visualization of deep mutational scanning data
GNU General Public License v3.0
31 stars 20 forks source link

fastqFromSRA error #43

Closed adingens closed 4 years ago

adingens commented 4 years ago

I am using the below code to run fastqFromSRA. This notebook is at /computational_notebooks/adingens/2019/SuperRestrictionFactor_Hypermutation/analysis_notebook.ipynb (this directory is not yet auto-pushed to github) I updated the aspera-connect version from one of Jesse's recent notebooks.

The code is: """ samples = pd.DataFrame.from_records( [('PLASMIDCTRL','SRR11059589'), ('NoA3_1','SRR11059588'), ('A3G_1','SRR11059577'), ('A3C_1','SRR11059573'), ('A3CE68A_1','SRR11059572'), ('COTD_1','SRR11059571'), ('COTDE254A_1','SRR11059570'), ('COTDE68AE254A_1','SRR11059569'), ('I188_1','SRR11059568'), ('I188E68A_1','SRR11059567'), ('COII_1','SRR11059587'), ('COIIE68AE254A_1','SRR11059586'), ('NoA3_2','SRR11059585'), ('A3G_2','SRR11059584'), ('A3C_2','SRR11059583'), ('A3CE68A_2','SRR11059582'), ('COTD_2','SRR11059581'), ('COTDE254A_2','SRR11059580'), ('COTDE68AE254A_2','SRR11059579'), ('I188_2','SRR11059578'), ('I188E68A_2','SRR11059576'), ('COII_2','SRR11059575'), ('COIIE68AE254A_2','SRR11059574')], columns=['name', 'run'] )

fastqdir = './results/FASTQ_files/' if not os.path.isdir(fastqdir): os.mkdir(fastqdir) print("Downloading FASTQ files from the SRA...") dms_tools2.sra.fastqFromSRA( samples=samples, fastq_dump='fastq-dump', # valid path to this program on the Hutch server fastqdir=fastqdir, aspera=( '/app/aspera-connect/3.7.5/bin/ascp', # valid path to ascp on Hutch server '/app/aspera-connect/3.7.5/etc/asperaweb_id_dsa.openssh' # Aspera key on Hutch server ), ) print("Here are the names of the downloaded files now found in {0}".format(fastqdir)) display(HTML(samples.to_html(index=False))) """

And I get the following error """

IndexError Traceback (most recent call last)

in 37 aspera=( 38 '/app/aspera-connect/3.7.5/bin/ascp', # valid path to ascp on Hutch server ---> 39 '/app/aspera-connect/3.7.5/etc/asperaweb_id_dsa.openssh' # Aspera key on Hutch server 40 ), 41 ) /fh/fast/bloom_j/software/conda_v2/envs/BloomLab/lib/python3.6/site-packages/dms_tools2/sra.py in fastqFromSRA(samples, fastq_dump, fastqdir, aspera, overwrite, passonly, no_downloads, ncpus) 81 "location accessible with command {0}".format(fastq_dump)) 82 fastq_dump_version = (subprocess.check_output([fastq_dump, '--version']) ---> 83 .decode('utf-8').split(':')[1].strip()) 84 fastq_dump_minversion = '2.8' 85 if not (distutils.version.LooseVersion(fastq_dump_version) >= IndexError: list index out of range """
jbloom commented 4 years ago

@adingens: Should be fixed in #44. The problem was that fastq-dump had changed the way it printed versions.

Note that there may still be some problem with aspera that has an origin that's unclear to me. But if so, you can probably just set that to None in the call to the function and use the (slower) fastq-dump method to download.