saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[ENH] Filter by publicly available datasets #164

Closed markziemann closed 2 years ago

markziemann commented 2 years ago

I'm running a search for all available RNA-seq data for an organism. pysradb returns all entries in SRA, but I'm not interested in the ones that are not publicly available. (For humans 185012 runs are "controlled access" and 1071087 are "public"). Is there a way to exclude the ones that aren't public or filter them afterwards somehow?

It would be great if there was an option to return only "public" entries.

pysradb search --organism=hsapiens --publication-date 01-01-2007:31-12-2007 --source="transcriptomic" --access="public"

Thanks for the great package. Mark

saketkc commented 2 years ago

Adding a flag is a great suggestion. For now, you could filter public datasets by appending "Public[Access]", something like:

pysradb search --organism="Homo sapiens" --publication-date 01-01-2015:01-01-2021 --source="transcriptomic" --query="Public[Access]"
markziemann commented 2 years ago

It works! Thank you so much!