Closed vishnums007 closed 6 months ago
The seed is just used to initialize the centroid creation and dataloader-- it is the seed used to generate random numbers.
You don't need to run multiple seeds, however it can be useful to get a sense of the variance in integration results. You could then choose the seed that got the highest integration score if you want.
"Okay, thank you for the swift response!"
@Yanay1 , I have a naive question regarding the highest integration score after running multiple seeds. In you vignette, the fz_multi_seeds_scores.csv provides values for logistic and balanced regression. I am assuming the best integration seed is the one that got highest logistic and balanced regression? If that is true, is it seed # 19 the selected seed for your frog_zebrafish_embryogenesis analysis. Seed#19 got has logistic regression = 0.867293 and balanced regression =0.545929 , which is the highest that you got out of 30 different seeds.
Thank you for your answer.
We did use the highest scoring seed for some downstream analysis, based on 30 seeds (seeds 0-29).
It might not have been seed 19 thoug:, the multi-seed run in the vignette although it is also 30 seeds from 0-29, might be different than then 0-29 seeds I ran originally since I re-ran it before pushing to Github, and things like running on a different machine or updating some packages can cause the seeds to change a bit.
Hi,
I am trying to integrate our zebrafish and mouse datasets. My analysis results are present here
Then, I read your vignette that "To replicate SATURN results for frog and zebrafish embryogenesis you need to run SATURN 30 times with different seeds."
Now I am wondering what are these seeds and how does it affect my cross-species integration? What do you think is the optimal number of seeds that I should be running in my analysis ?
Which seed should I be selecting at the end ?
Please help me.
Thanks in advance, Vishnu