Do we need to correct the batch effects of given datasets

HelloWorldLTY commented 2 months ago

Hi, thanks for your great work. I wonder if we need to correct the batch effects of these spatial transcriptomic data or not. Thanks a lot!

guillaumejaume commented 2 months ago

Hi, it depends on what you want to do with HEST data. What's your use case?

HelloWorldLTY commented 2 months ago

I am interested in the Visium data only. Thanks.

guillaumejaume commented 2 months ago

Visium data integrated into HEST-1k are very diverse: 2 species (mouse and human), multiple diseases, and organs. Batch effect correction should always be done if there are some guarantees that it won't significantly alter the biological signal.

To give a better answer, I need a better understanding of your problem statement, e.g., multimodal representation learning, ST prediction from H&E, characterization of morphological correlates of expression changes, etc.

If you want to explore batch effect, we implemented 2 core functions:

Batch effect visualization, here, which does a UMAP viz of the gene expression of housekeeping genes (ie stable genes) in the stromal region. The function can take as input a series of visium samples that you want to use.
Batch effect correction, here, which can correct batch effects using MNN, Harmony, and Combat. The output of each method is different, e.g., Harmony creates a new latent space, so the output cannot be interpreted as gene counts anymore (this may or may not be an issue for your problem statement)

HelloWorldLTY commented 2 months ago

Thanks! I will take a look at it!

guillaumejaume commented 2 months ago

@HelloWorldLTY, feel free to document any findings on this GitHub issue.