openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
312 stars 78 forks source link

Add scArches #9

Closed scottgigante closed 1 year ago

scottgigante commented 4 years ago

@M0hammadL Malte suggested adding scArches at a minimum to fill this out as three methods

M0hammadL commented 4 years ago

@scottgigante I think scArches, will not be possible here I think, since the method does not perform classification by it self, it will align the query data to reference which we can use a simple knn classifier to carry labels from reference to the query. Therefore I am not sure whether it can be considered as a classification method or not!

dburkhardt commented 4 years ago

I think that having scArches + kNN classifer would be a great baseline to have. Thumbing through the preprint, I think that these results are compelling:

Building upon the query-reference embedding, we investigated the transfer of cell-type labels from the reference dataset. We approached this classification problem by first training a simple kNN classifier on the latent space representation of the reference TS. Then each cell in the query TM was annotated using its closest neighbors in the reference dataset. Additionally, our classification pipeline provides an uncertainty score for each cell while reporting cells with more than 50 % uncertainty as unknown (see Methods). Our model transferred the labels from the reference atlas to the query atlas with ≈ 89% accuracy for all the tissues except tracheal cells (Figure 3d). Moreover, all misclassified cells and cells from the out-of-distribution tissue received high uncertainty scores (Figure 3e-f). Overall, the classification results across tissues indicated a robust prediction accuracy across most tissues (Figure 3g) while highlighting which cells were not mappable to the reference. The robust performance of a simple KNN classifier on the integrated latent space demonstrates that scArches can successfully merge large and complex query datasets into reference atlases.

I understand you would typically include some manual fine-tuning but I would love to see these results added to Open Problems