ncbi / sra-tools

SRA Tools
Other
1.07k stars 243 forks source link

fasterq-dump unmated reads #904

Closed cgob closed 3 months ago

cgob commented 5 months ago

Hi,

I am downloading fastq files using fasterq-dump for the accession number SRR7535543 (scRNA-seq, drop-seq). I successfully obtained the expected SRR7535543_1.fastq (CB+UMI, 21nt) and SRR7535543_2.fastq (cDNA, 61) files, each containing 18,549,017 reads. Surprisingly, I am also getting a large file of unmated reads, SRR7535543.fastq, which contains 111,500,535 reads (cDNA, 61). I have never encountered such a problem before. Could it be related to the way the author uploaded the data or some unusual behavior of fasterq-dump?

Thank you, C

wraetz commented 5 months ago

There is nothing wrong with the tool. It is related to the data - I just looked into it. The run contains a mix of correctly paired data - that is what you get in the SRR7535543_1.fastq and SRR7535543_2.fastq files. But this is mixed with spots that contain only 1 biological read. This is what is contained in the SRR7535543.fastq file. If you think that should not be the case, please contact the NCBI help-desk. ( https://support.nlm.nih.gov/support/create-case/ )

wraetz commented 5 months ago

A simple way to check is to run "sra-info -S SRR7535543". It gives you a short summary of the different read-layouts in the accession.

cgob commented 5 months ago

Great. Thanks a lot for the prompt reply.