saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

[ENH] publication_date parameter #221

Open schmucr1 opened 1 week ago

schmucr1 commented 1 week ago

Hello !

Just a question that I'd like to ask here, as I cannot use Slack.

After how much time (hours, days) are GEO studies searchable with the pysradb Python API?

When I set the publication date to today - 1 day, then no results are found,

nstance = GeoSearch(publication_date="05-09-2024:06-09-2024", return_max=100, verbosity=3) ; instance.search(); df=instance.get_df(); print(df); print(df['study_alias'].unique()) No results found for the following search query: SRA: {'query': 'sra gds[Filter]', 'accession': None, 'organism': None, 'layout': None, 'mbases': None, 'publication_date': '05-09-2024:06-09-2024', 'platform': None, 'selection': None, 'source': None, 'strategy': None, 'title': None} GEO DataSets: {'query': 'gds sra[Filter]', 'dataset_type': None, 'entry_type': None, 'publication_date': '05-09-2024:06-09-2024', 'organism': None} Empty DataFrame Columns: [] Index: [] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/app_user/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3761, in __getitem__ indexer = self.columns.get_loc(key) File "/home/app_user/.local/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 349, in get_loc raise KeyError(key) KeyError: 'study_alias'

Although there are bulk RNAseq studies published on Sep 5, e.g., GSE274586 or GSE262282.

A study published on Sep 4, GSE260817, is also not found with publication_date parameter today - 2 days:

instance = GeoSearch(publication_date="04-09-2024:06-09-2024", return_max=1000, verbosity=3) ; instance.search(); df=instance.get_df(); print(df['study_alias'].unique()); sum(df['study_alias'].unique()=='GSE260817') 100%|███████████████████████████████████████████████████████████████████████████████████████████| 468/468 [00:23<00:00, 19.67it/s] ['GSE262126' 'GSE248977' 'GSE227333' 'GSE244392' 'GSE248058' 'GSE272252' 'GSE248629' 'GSE227216' 'GSE233998' 'GSE275863' 'GSE261310' 'GSE240264' 'GSE253407' 'GSE249182' 'GSE227338' 'GSE264212' 'GSE262125' 'GSE272635' 'GSE276038' 'GSE271653' 'GSE276204' 'GSE267342' 'GSE232874' 'GSE267343' 'GSE276065' 'GSE271667' 'GSE276245' 'GSE248627' 'GSE276037' 'GSE276206' 'GSE248628'] 0

What are the contraints?

Thank you and best regards, Roland