saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
303 stars 49 forks source link

Incosistencies with retrieving SRX data from different archives #217

Closed nasjr08 closed 3 months ago

nasjr08 commented 3 months ago

More of a general question about the sources of data used to retrieve data.

Previously I have used the following to retrieve SRA data: aws s3 cp s3://sra-pub-run-odp/sra/SRR1119486/SRR1119486 .

From what I can tell, running the following command: pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRX434118

uses the following link: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR111/SRR1119486/SRR1119486.sra

Fastq-dump on both .sra files output fastq_1.gz and fastq_2.gz. The files produced by the pysradb sra file are significantly smaller in size than the first command. Has this discrepancy been reported previously?

Thank you for the help.

saketkc commented 3 months ago

Can you open an issue on NCBI's github? This has to do with (I haven't checked) the mismatch between the .sra on ftp and the .sra that was uploaded on AWS.