rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
265 stars 33 forks source link

include-technical reads for single cell studies #45

Closed MEFarhadieh closed 2 years ago

MEFarhadieh commented 2 years ago

Thanks for this great tool! Can I use --include-technical flag in parallel-fastq-dump command, like fasterq-dump to make a separate file for UMI reads of single cell sra?

rvalieris commented 2 years ago

hello,

parallel-fastq-dump is using fastq-dump under the hood, and it already dumps technical reads by default, so you should be able to get what you want just by using the --split-files argument I think.

to not get technical reads, you need to use --skip-technical

if you have a example SRR you are interested I can look into it better.

MEFarhadieh commented 2 years ago

Thank you so much for your quick reply.

I ran parallel-fastq-dump --sra-id SRR11422712 --threads 10 --split-files --gzip for SRR11422712, which contains 3 reads per spot. However, I got one fastq.gz file that only includes biological reads.

I also, used prefetch and fastq-dump with --split-files argument and again I got one fastq file.

Finally, I ran fasterq-dump ~/SRR11422712/SRR11422712.sra --include-technical --split-files and got three fastq files include sample index, UMIs, and biological reads.

rvalieris commented 2 years ago

thats weird, I tested with this same SRR and got 3 files, _1 and _2 with technical reads and _3 with biological reads.

make sure you are using the latest versions of parallel-fast-dump and sra-tools:

$ parallel-fastq-dump --version
parallel-fastq-dump : 0.6.7

"fastq-dump" version 2.11.0

also, check the logs printed on the screen while it runs, there might be some warning/error message.

MEFarhadieh commented 2 years ago

There is neither warning nor error, and my parallel-fastq-dump version is 0..6.7. However, I found 2 fastq-dump versions in my $PATH. version 2.11.0 in Miniconda bin and version 3.0.0 in usr local bin. I removed version 3.0.0 and tried again, but I got same. I will reinstall and reconfigure packages, and comment result.

I'm so sorry for this and appreciate for your support.

rvalieris commented 2 years ago

try this command: fastq-dump --split-files --gzip -N 1 -X 1000 SRR11422712 or fastq-dump --split-files --gzip -N 1 -X 1000 ~/SRR11422712/SRR11422712.sra

parallel-fastq-dump is running fastq-dump commands like this is the background, if you don't get 3 files from this command theres something wrong with sra-tools.

MEFarhadieh commented 2 years ago

After I had uninstalled and removed sra tool kit 3.0.0 and reinstalled 2.11.0 by conda, fastq-dump --split-files --gzip -N 1 -X 1000 ~/SRR11422712/SRR11422712.sra returned 3 files correctly. Also parallel-fastq-dump.

I guess that was due to sra 3.0.0.

Thank you so much for all your help.

rvalieris commented 2 years ago

nice, I'm glad its working now, but its weird that version 3.0.0 gives a different result, you might want to followup on this with the sra-tools authors.