saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[QUESTION] SRAdb up to date? #157

Closed snayfach closed 2 years ago

snayfach commented 2 years ago

I recall that SRAdb was no longer being maintained and also noticed that some recent SRA accessions (ex: ERR6349793) are missing from the sqlite3 database: https://s3.amazonaws.com/starbuck1/sradb/SRAmetadb.sqlite.gz. Is this still the source of SRA metadata used by pysradb?

saketkc commented 2 years ago

No, pysradb uses NCBI's web API. A bit unfortunate in hindsight that the name of this package was dervied from SRAdb - to which it now has absolutely no relation/dependence.

snayfach commented 2 years ago

Thanks for the quick response! I had read that in your F1000 article: https://f1000research.com/articles/8-532

Is it possible to use pysradb to get metadata for all runs in the SRA? Any recommendations for that?

saketkc commented 2 years ago

I was considering an update to the article to reflect some of the new functionalities we added over the last couple of years, particularly with @bscrow's search functionality.

It is currently not possible to get all the metadata since NCBI's api will time out. There are dumps available here that might be helpful: https://ftp.ncbi.nlm.nih.gov/sra/reports/Metadata/

snayfach commented 2 years ago

I'm aware of those, but it's a big pain since there are millions of XML files. Thanks for the responses.

saketkc commented 2 years ago

Right, I know it's frustrating. There seems to be some code available to ease the process (that I have not tested): https://github.com/linsalrob/SRA_Metadata