kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

getfastq --id is currently broken #10

Closed Hego-CCTB closed 4 years ago

Hego-CCTB commented 4 years ago

when just looking for an SRA ID instead of the metadata.tsv, I get this error:

amalgkit getfastq --threads 8 --id SRR7699519 -e abc@abc.com amalgkit getfastq: start pigz found. It will be used for compression/decompression in read name formatting. --id is specified. Downloading SRA metadata from Entrez. Traceback (most recent call last): File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 254, in args.handler(args) File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 31, in command_getfastq getfastq_main(args) File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main metadata = getfastq_metadata(args) File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term) NameError: name 'sra_id' is not defined

Looking through the code, "sra_id" is only assigned by accessing the metadata.tsv, which is mutually exclusive to --id. I think the easiest solution would be as it was before the "metadata update", by creating a new metadata.tsv with just a single entry and have the rest of the code run as it is right now.

Ah, also we need to add PigZ to the dependencies.

kfuku52 commented 4 years ago

That sounds the right solution. Could you fix it?

Hego-CCTB commented 4 years ago

sure!

takaW496 commented 4 years ago

I got the same error message when I tried to run getfastq process using bioproject ID in gfe pipeline:

Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/amalgkit", line 254, in <module>
    args.handler(args)
  File "/opt/conda/envs/biotools/bin/amalgkit", line 31, in command_getfastq
    getfastq_main(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main
    metadata = getfastq_metadata(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata
    search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term)
NameError: name 'sra_id' is not defined

@Hego-CCTB did you fix the problem? Could you share the fixed script?

kfuku52 commented 4 years ago

@Hego-CCTB Are you aware of Taka's question?

Hego-CCTB commented 4 years ago

yes! I'm looking into it, but failed to make progress so far. My "fix" created a host of other problems, but I hope I can get a working update out soon.

Hego-CCTB commented 4 years ago

@takaW496 Problem should be fixed now. I've also included a --id_list functionality, which can process multiple SRA runs, while --id is reserved for a single run.

--id_list needs a path to a simple text file, where each ID is in a different row. --id_list does currently only queue the download of each run, but doesn't download them in parallel (this is what I'm looking into next)

I'll close this for now, but feel free to reopen this issue if you encounter any other problems regarding this.