saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
313 stars 51 forks source link

[BUG] aspera #202

Closed NomiCentarix closed 1 year ago

NomiCentarix commented 1 year ago

Describe the bug Hello and thank you for the awesome package. I installed "aspera" as described in the notebook "pysradb_ascp_multithreaded.ipynb" However, when I specify the argument "use_ascp" in both python API and CLI, the downloading doe not occur (the SRP/SRX folders are created but are empty).

To Reproduce


from pysradb.sraweb import SRAweb
SRA_OUR_DIR = "/data/NCBI_data/"
db = SRAweb()
gse_to_srp = db.gse_to_srp("GSE226189")
print("gse_to_srp shape:", gse_to_srp.shape)
display(gse_to_srp.head(2))

metadata = db.sra_metadata(gse_to_srp["study_accession"].to_list(), detailed=True)
print(metadata.shape)
display(metadata.head(1))

db.download(df=metadata.head(1), 
            url_col="ena_fastq_http",
            use_ascp=True,
            threads=8,
            skip_confirmation=True,#don't ask for permmision to download
            out_dir=SRA_OUR_DIR)  

Desktop (please complete the following information):

Additional context I guess the problem has to do with how the aspera is installed. When I type "aspera" in terminal I get the error

"import apt_pkg ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /usr/lib/python3/dist-packages/apt_pkg.cpython-310-x86_64-linux-gnu.so)"

But I cannot fix it.

Thank you very much Nomi

So I

saketkc commented 1 year ago

Thanks, this issue seems to be coming from a library version mismatch from your python installation. I would recommend using conda/bioconda to create a separate enivronment for pysradb.

NomiCentarix commented 1 year ago

ok I fixed the above problem but still don't get the fastq file, only empty folders, with the following code:

from pysradb.sraweb import SRAweb
SRA_OUR_DIR = "/data/NCBI_data/"
db = SRAweb()
gse_to_srp = db.gse_to_srp("GSE226189")
print("gse_to_srp shape:", gse_to_srp.shape)
display(gse_to_srp.head(2))

metadata = db.sra_metadata(gse_to_srp["study_accession"].to_list(), detailed=True)
print(metadata.shape)
display(metadata.head(2))

db.download(df=metadata.head(1), 
            url_col="ena_fastq_http_1",
            use_ascp=True,
            #threads=8,
            skip_confirmation=True,#don't ask for permmision to download
            out_dir=SRA_OUR_DIR)  

when the _urlcol is the default I do get the .sra files. The link in column "ena_fastq_http_1" seems fine (http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR236/077/SRR23630177/SRR23630177_1.fastq.gz)

NomiCentarix commented 1 year ago

Hi Is my new comment can be seen after issue was closed?