stschiff / msmc2

GNU General Public License v3.0
53 stars 9 forks source link

Question about input vcf in each population #57

Closed hungweichen0327 closed 1 year ago

hungweichen0327 commented 1 year ago

Dear @stschiff,

I would like to ask about the input vcf in each population. I want to know whether the vcf file in each population should have the same number of variant sites.

For example, if there are two populations (population A and population B) and each population has one individual. The genotype of the position chr1:500 in the vcf file of population A is "0|1". But there is no genotype of the position chr1:500 in the vcf file of population B since the genotype is 0|0 in the individual in population B. In other words, the SNP number in the vcf file between population A and population B is different. Is that okay to run MSMC2? Will MSMC2 skip the non-overlapping sites or treat the genotype of missing sites as 0|0?

Thank you for the help. I am grateful for the clear manual and valuable MSMC2 software when doing demography-related analysis.

stschiff commented 1 year ago

MSMC doesn't run on VCFs, but on a joint input file for all individuals. What merges VCF files is my tool generate_multihetsep.py from MSMC-tools. As far as I remember that tool iterates through all VCFs in an ordered fashion and then marks the one that are present in one file but missing in others as missing.

hungweichen0327 commented 1 year ago

Thank you for the explanation. I got it!