theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[SRA-Fetch] Reimplement using fasterq-dump directly to avoid SRA-Lite file download #479

Open cimendes opened 1 month ago

cimendes commented 1 month ago

The current implementation of fastq-dl doesn't seem very robust to avoid downloading SRA-Lite files. One alternative is to reimplement the workflow by leveraging NCBI's Fasterq-dump directly

kapsakcj commented 1 month ago

FYI it's not always fastq-dl's fault & it actually uses sra-tools commands prefetch and fasterq-dump under the hood, I think when specifying fastq-dl --provider sra --only-provider. https://github.com/rpetit3/fastq-dl/issues/23#issuecomment-1666989459

It tries to download SRA Normalized format when possible but even with that configuration it is still possible to download SRA Lite formatted files. This happens when SRA only serves up the SRA lite formatted files and doesn't serve up the original/SRA Normalized formatted files

Just warning that even if we switch to fasterq-dump this problem can still occur. Those problem SRR's need to be reported to SRA so they can fix them