saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[BUG] gsm_to_srx false positives (reanalysis) #165

Closed Maarten-vd-Sande closed 1 year ago

Maarten-vd-Sande commented 2 years ago

Describe the bug

Not sure if it's a bug on the pysradb side or SRA. But I seem to get some false positives:

import pysradb
db_sra = pysradb.SRAweb()
db_sra.gsm_to_srx(["GSM1155957"])

  experiment_alias experiment_accession
3       GSM1621354            SRX893751
4       GSM1621353            SRX893750
5       GSM1621352            SRX893749
6       GSM1621351            SRX893748
7       GSM1155957            SRX298000

Could be because the other GSM numbers are a re-analysis of the previous? https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1155957

saketkc commented 2 years ago

That is correct - the results reflect what we see on the search page: https://www.ncbi.nlm.nih.gov/gds/?term=GSM1155957 We could handle this internally, but for now I would recommend subsetting based on exact string match.