stschiff / msmc-tools

Tools and Utilities for msmc and msmc2
46 stars 17 forks source link

Concerns regarding using MSMC2 with unequal haplotype counts between populations #53

Closed Luker121 closed 1 year ago

Luker121 commented 1 year ago

Hi, I am currently using MSMC2 for inferring cross-coalescence rates between two populations, and I have encountered a situation where one population has a significantly larger number of haplotypes than the other population. However, I am concerned about potential biases that may arise due to the unequal sample sizes. Therefore, I would like to ask, whether you have addressed this issue already somewhere? Or provided any recommendations for using MSMC2 in such scenarios. Or would you always recommend using an equal number of haplotypes for both populations (when using two populations)

I am looking forward to hear back from you.

stschiff commented 1 year ago

Hi @Luker121. No I have not assessed this. You can run simulations to test for any biases. Alternatively just create multiple down-samplings of the larger group and run on equal sizes. At least then you get multiple results you can compare. You could also then just run it on the complete dataset and see whether the resulting curve differs from the downsampled ones.

Luker121 commented 1 year ago

Ok, thanks for the information. I will run some simulations and will let you know.