new datasets - Githubissues

uc-bd2k / GREIN

GREIN : GEO RNA-seq Experiments Interactive Navigator

https://shiny.ilincs.org/grein

GNU General Public License v2.0

48 stars 19 forks source link

new datasets #20

Closed mfazel closed 8 months ago

mfazel commented 2 years ago

Hi guys, I was wondering why the number of in progress datasets are more than the processed ones (first page plot). I noticed that something probably has changed and the pipeline can no longer download data from GEO. I checked several recently published data on GEO and when tried to analyze, they either had been tried before by someone else and failed or if I submit it for analysis, it fails the download step shortly after extracting metadata. (ie GSE125422, GSE159067...). It seems many dataset names and metadata have been added to the database list but failed to download and process. Any idea?

michalkouril commented 2 years ago

Hello, Thanks for the note -- we are checking on the datasets you listed. Michal

michalkouril commented 2 years ago

Hello, As far as the number of datasets in progress -- this number relates to all available datasets in GEO that we haven't had a chance to process yet -- and we are processing them fast as we can. They are in a lower priority queue so user submitted datasets get a priority. Regarding the pipeline -- it should be working. Your particular examples might not have data in SRA -- let us know if you think otherwise and we'll dig deeper into what might be going on. Thanks, Michal

mfazel commented 2 years ago

Hi Michal, I checked a few other datasets and non of the reasons given after submitting and failing the dataset for process using the queue are true, same as the above examples. I came up with a possible reason, that is, I know GREIN uses a list of datasets from GEO and if any dataset does not exist in that list, wont be processed and that list is not updated at least during the past year or not being automated. One way to find it out, is to check what datasets are being processed right now, if any, and then see their release date in GEO, probably non from 2022 or even 2021. This is just my guess but likely to be true. Thanks