saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
303 stars 49 forks source link

fastq's URLs are empty #209

Open NomiCentarix opened 7 months ago

NomiCentarix commented 7 months ago

Describe the bug The columns "ena_fastq_http", "ena_fastq_http" and "ena_fastq_http" are all NA. I tested the code in several environments, and no change. (the data does exist in the same path as before http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR236/077/SRR23630177/SRR23630177_1.fastq.gz)

To Reproduce

from pysradb.sraweb import SRAweb

db = SRAweb()
gse_to_srp = db.gse_to_srp("GSE226189")
print("gse_to_srp shape:", gse_to_srp.shape)
display(gse_to_srp.head(2))

metadata = db.sra_metadata(gse_to_srp["study_accession"].to_list(), detailed=True)
print(metadata.shape)
display(metadata.head(2))

Desktop (please complete the following information):

thanks

saketkc commented 7 months ago

Thanks for the bug report. It is possible something has changed at the EBI end. I will try to check.

marcomoretto commented 6 months ago

I think the bug is related to the concurrent part here https://github.com/saketkc/pysradb/blob/99d0ef76f85f64659388c219e639c47324e7f213/pysradb/sraweb.py#L694

Calling fetch_ena_fastq works as expected and retrieve the correct URLs

from pysradb.sraweb import SRAweb
db = SRAweb()
db.fetch_ena_fastq("SRP059263")