saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
303 stars 49 forks source link

Filtering results by instrument type #196

Closed mlw22 closed 1 year ago

mlw22 commented 1 year ago

I was inputting multiple sample_accession ids to get the run accession ids, but I ran into the issue of multiple run ids per sample accession. I want just the run accessions that used the ILLUMINA instrument. For example I was using this command: pysradb metadata SRS1840572. I was isolated the SRR ID by looking for the SRR pattern. Is there already a way to filter the results or no?

saketkc commented 1 year ago

Can you elaborate on what your question is? SRS1840572 is indeed the same sample but sequenced differently using two different machines for two different projects that are all part of this BioProject (resulting in two SRRs): https://www.ncbi.nlm.nih.gov/bioproject/PRJNA646837

Once you get the metadata, you should be able to filter for the instrument using the instrument column (Python/R/any langauge that supports importing a tsv)

mlw22 commented 1 year ago

I was wondering if there was an additional argument I could had to the metadata statement to filter by instrument, but I will try filtering it after I get the metadata, thanks!

saketkc commented 1 year ago

This notebook might give you a rough sketch of how you can use Python API to filter: https://github.com/saketkc/pysradb/blob/develop/notebooks/01.Python-API_demo.ipynb