If several SNPs are in proximity to each other, the scansnp-random.f script randomly selects one to keep. I am trying to compare the CNV results across multiple tumors from the same patient and would like the SNP selections to be the same for all samples. I know that one can set seed to make the results reproducible within the same sample or even across samples that have the same panel (haven't checked thoroughly), but not the same SNPs are selected between different panel versions (such as IM5 and IM6) even when I set the same seed. This is because different numbers of random number generation are performed if the regions are different.
I wonder if it would be better to always keep the first SNP in the same bin to make the results more comparable between samples.
Hi,
If several SNPs are in proximity to each other, the
scansnp-random.f
script randomly selects one to keep. I am trying to compare the CNV results across multiple tumors from the same patient and would like the SNP selections to be the same for all samples. I know that one can set seed to make the results reproducible within the same sample or even across samples that have the same panel (haven't checked thoroughly), but not the same SNPs are selected between different panel versions (such as IM5 and IM6) even when I set the same seed. This is because different numbers of random number generation are performed if the regions are different.I wonder if it would be better to always keep the first SNP in the same bin to make the results more comparable between samples.
Thanks, Teng