Closed alex-d13 closed 2 years ago
to point 4: i looked at the datasets, only one is completely available (Darmanis et al CReports: http://gbmseq.org/); I am not sure if this annotation is detailed enough? For the other 3:
Here's the dataset with ERCC spike-ins we discussed today (Travaglini et al 2020):
Will send you a download-link to the preprocessed files via email.
Current status of datasets: https://docs.google.com/document/d/1a8uu0-GclIa9yy2Hs_AkoSJnMEOHLZkQdmCzwWlueoQ/edit#
Test-cases suggested by Francesca:
Mouse data: it would be a completely new application that we should develop as soon as possible, and there are already bulk RNA-seq + FACS data for independent validation that we could start in Innsbruck next September/October. In this respect, the Tabula Muris data seems a great dataset to start with, as it has both Smart-seq2 and 10X data (correct!?). GregorSturm what do you think about the quality/resolution of the available annotation? Are the raw data easily accessible?
Human TIL data: Gregor has already done quite some work on the Zemin Zhang dataset, a collaborator of mine could help us with the validation, and it would be a nice way to test the ability of the tools to disentangle closely related cell types. Also,10x data is also available for building the signatures (see Szabo et al. and Cano-Gomez et al.).
Human lung-cancer data: raw data from the Maynard study are readily available and Gregor has already done quite some work with annotation.
Human glioblastoma (or glioma or brain cancer) data: there are some Smart-seq data available (see Table 1 in this paper -- Alex, could you check for raw data availability?) and I have a collaborator that could help us with the validation.