saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
310 stars 51 forks source link

[ENH] exclude dbgap samples when using search #112

Open Maarten-vd-Sande opened 3 years ago

Maarten-vd-Sande commented 3 years ago

Is your feature request related to a problem? Please describe. Not sure if it is already implemented, but would it be possible to exclude private/dbgap samples when using the search functionality?

If not yet supported, that would be a very useful functionality for me :)

saketkc commented 3 years ago

This is good idea! Paging @bscrow if he is willing to help, else I will add it to my list of ToDos.

bscrow commented 3 years ago

Based on the current implementation, we can exclude private samples from the search by adding "AND Public[Access]" to the search text. For eg, "bacterium AND nanopore" -> "bacterium AND nanopore AND Public[Access]". dbgap samples can be excluded by appending "NOT cluster dbgap[Properties]"

If this is a frequently used feature, maybe we can implement a separate flag for this functionality

Maarten-vd-Sande commented 3 years ago

My guess would be that this is something many people would be interested in. Does this then already work with the Python API? Something like:

from pysradb.search import SraSearch

instance = SraSearch(query="h3k27ac AND Public[Access]", selection="chip", publication_date="01-01-2015:01-01-2021", platform="illumina", organism="Homo sapiens", verbosity=2, return_max=1_000_000)
instance.search()

When I use this query I get less experiments, which sort-of indicates it works?

p.s. so far the search functionality works really nice, thanks for the feature @bscrow

bscrow commented 3 years ago

Yep that will work by keeping only entries with publically available data. From what I can see on SRA it's slightly different from excluding dbgap samples as some dbgap samples are classified as public.

I'm really glad that you like the search functionality! I'll try to see if I can implement the option to exclude dbgap samples from a search.