snap-stanford / SATURN

MIT License
108 stars 17 forks source link

What are seeds? What is the optimal number of seeds to run? #54

Closed vishnums007 closed 6 months ago

vishnums007 commented 6 months ago

Hi,

I am trying to integrate our zebrafish and mouse datasets. My analysis results are present here

Then, I read your vignette that "To replicate SATURN results for frog and zebrafish embryogenesis you need to run SATURN 30 times with different seeds."

Now I am wondering what are these seeds and how does it affect my cross-species integration? What do you think is the optimal number of seeds that I should be running in my analysis ?

Which seed should I be selecting at the end ?

Please help me.

Thanks in advance, Vishnu

Yanay1 commented 6 months ago

The seed is just used to initialize the centroid creation and dataloader-- it is the seed used to generate random numbers.

You don't need to run multiple seeds, however it can be useful to get a sense of the variance in integration results. You could then choose the seed that got the highest integration score if you want.

vishnums007 commented 6 months ago

"Okay, thank you for the swift response!"

vishnums007 commented 6 months ago

@Yanay1 , I have a naive question regarding the highest integration score after running multiple seeds. In you vignette, the fz_multi_seeds_scores.csv provides values for logistic and balanced regression. I am assuming the best integration seed is the one that got highest logistic and balanced regression? If that is true, is it seed # 19 the selected seed for your frog_zebrafish_embryogenesis analysis. Seed#19 got has logistic regression = 0.867293 and balanced regression =0.545929 , which is the highest that you got out of 30 different seeds.

Thank you for your answer.

Yanay1 commented 6 months ago

We did use the highest scoring seed for some downstream analysis, based on 30 seeds (seeds 0-29).

It might not have been seed 19 thoug:, the multi-seed run in the vignette although it is also 30 seeds from 0-29, might be different than then 0-29 seeds I ran originally since I re-ran it before pushing to Github, and things like running on a different machine or updating some packages can cause the seeds to change a bit.