Incomplete Dataset and New Datasets

Light2053 commented 9 months ago

Respected developers and authors,

I sincerely thank you for hosting and maintaining this website analyzing and providing raw data for tons of RNA-seq studies in GEO. I have also recently used the processing request to get some of the datasets analyzed and its great to see such a quick process on your end.

I have some queries regarding certain human datasets I am investigating that are hosted on GEO. The details with GSE IDs are as follows:

GSE135743 - It seems the dataset is analyzed for only 20 samples while the total samples in the GEO is 59. Can it be redone on your end with all the samples?

GSE144254 and GSE78928 - It seems these datasets are not available on your backend. Last I checked GSE144254 has 42 samples of only bulk mRNA-seq data.

GSE78928 has bulk mRNA seq samples in both Homo Sapiens and Mus musculus. Additionally it has ncRNA-seq data as well. I guess the pipeline cannot differentiate multiple organisms/RNA library types in a single dataset?

Atleast GEO shows that Raw SRA for both of these datasets are available. Is there a way to request the authors/developers of GREIN to add datasets and specifc samples on your backend for analysis?

GSE182866 - This dataset has no accessible raw SRA in the GEO database. So I assume this cannot be processed?

Thank You .

Mario-Medvedovic commented 9 months ago

Hello, Happy to hear that you find GREIN useful. I took a look at the datasets you are missing and here is what we can and cannot do right now.

GSE135743: Not sure why only 20 samples were processed. I will try re-processing. GSE144254: We attempted processing of this dataset a couple of years ago, but the download failed. I will try re-processing GSE78928: This dataset contains a mix of mRNA and snRNA samples and it probably confused our pipeline. It is possible that our latest version would correctly process mRNA samples and drop the snRNA samples, so I will try re-processing.

I should be able to handle these within a week.

As for the other two datasets, you are correct. Our pipeline does not work with mixed species datasets. This is something we will try solving in the future, but it requires structural changes of our UI, so it will not happen soon. As for the datasets without raw data, we have no plans on processing them.

Hope this helps, Mario

Light2053 commented 9 months ago

Dear Mario,

Thank You for your swift response and resolution. I am glad to hear that analysis will be attempted on these datasets. I will be waiting for the processing

Have a good year ahead. Regards Sushanth

Mario-Medvedovic commented 8 months ago

The datasets were now re-processed:

GSE135743: Has now all 59 samples GSE144254: Processed fine this time GSE78928: Failed again. It failed early in the metadata retrieval and it will take some work to troubleshoot.

Hope this is somewhat helpful. Mario

Light2053 commented 8 months ago

Thank You Mario. Much appreciated

Regards Sushanth

uc-bd2k / GREIN

Incomplete Dataset and New Datasets #25