snap-stanford / SATURN

MIT License
108 stars 17 forks source link

Vertebrate-level brain atlas, questions about evolutionary distanct integration #62

Open DiracZhu1998 opened 5 months ago

DiracZhu1998 commented 5 months ago

Dear authors,

Thank you for giving us such a wonderful toolkit! If I want to build an evolutionary distance atlas, should we use major cell type level as "cell_type" label? Is there any other parameters you would recommend we could tune to make the atlas better since the default results doesn't integrated well.

Thank you for your help!

Best wishes, Yuanzhen

Screenshot 2024-06-17 at 09 04 39
DiracZhu1998 commented 5 months ago

In addition, I checked but couldn't find any relevant code and parameter usage related to the frog-zebrafish integration in your paper. The Jupyter notebook you provided is not the version you generated for the paper, the graph and integration in your paper are great but frog-zebrafish with default parameters is not that good.

Yanay1 commented 5 months ago

The jupyter notebook is the version used in the paper (same hyperparameters, the random seed will be slightly different but this shouldn't make a hude difference), how were the results different?

How are you judging how well the species are integrated? You should try transferring labels between species and measuring accuracy.

DiracZhu1998 commented 5 months ago

Hi Yanay, thank you for your quick response! Probably you are right, I just compared them with naked eye. so not that accurate but looks quite different from your paper. I assume that the same major clusters (cell types) from different species should be close to each other rather than separate. I also tested for human and mouse whole brain atlas, It also doesn't integrated well.

Screenshot 2024-06-17 at 20 37 06 Screenshot 2024-06-17 at 20 36 14 Screenshot 2024-06-17 at 20 41 42
DiracZhu1998 commented 5 months ago

I checked about distance between your generated protein embeddings and mine, the corresponding genes had the lowest distance so no problem with the step of protein embedding. The problem seems to be related to the scRNA and snRNA datasets, once I removed the snRNA human dataset and only integrated mouse and lizard (both are scRNA datasets), they integrated much better than before. I was wondering do you have some recommendations to give more "force" on integration to make snRNA human better integrate with other scRNA datasets, for example, maybe increasing the pretrain numbers? Many thanks!

Screenshot 2024-06-19 at 20 15 16
Yanay1 commented 3 weeks ago

The UMAP for frog and zebrafish looks pretty similar. The actual UMAP will not be the exact same because of random seed and different hardware/versions. Another aspect that is different is that in the UMAP you generated, it looks like points are smaller/are in a random order, which causes the clusters to appear more mixed, which can be hard to see.