jsxlei / SCALEX

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
BSD 3-Clause "New" or "Revised" License
71 stars 18 forks source link

Projection onto new dataset requires shared latent space #15

Closed ryeking2010 closed 1 year ago

ryeking2010 commented 1 year ago

Hello,

I am really excited about SCALEX and it seems to do a beautiful job of integrating the data together. The nature communications paper promotes SCALEX as being able to project a new dataset without requiring retaining and was shown to have a projection F1 score of 0.925 (fig 3c).

However, would it be possible to include the projection steps in the tutorial as well?

Otherwise, I see the API has scalex.label_transfer(ref, query, rep='latent', label='celltype'). While this seems fine, there's a huge issue with the rep argument. Both the reference and query must have this reduced dimensionality. How do you get the same latent space without re-training the query data, as advertised in the paper? This there a function or method to do this? Again, it would be nice to have this in the tutorial, as it feels like it was the main point of the paper.

Thanks! Ryan

jsxlei commented 1 year ago

Hi, projection should base on a trained model, the new data does not need to retrain again and will be projected on the same cell embedding space. The tutoral of projection has already in the document. https://scalex.readthedocs.io/en/latest/tutorial/Projection_pancreas.html. At the same time, we also provide three altas as reference.