Closed hungweichen0327 closed 1 year ago
MSMC doesn't run on VCFs, but on a joint input file for all individuals. What merges VCF files is my tool generate_multihetsep.py
from MSMC-tools. As far as I remember that tool iterates through all VCFs in an ordered fashion and then marks the one that are present in one file but missing in others as missing.
Thank you for the explanation. I got it!
Dear @stschiff,
I would like to ask about the input vcf in each population. I want to know whether the vcf file in each population should have the same number of variant sites.
For example, if there are two populations (population A and population B) and each population has one individual. The genotype of the position chr1:500 in the vcf file of population A is "0|1". But there is no genotype of the position chr1:500 in the vcf file of population B since the genotype is 0|0 in the individual in population B. In other words, the SNP number in the vcf file between population A and population B is different. Is that okay to run MSMC2? Will MSMC2 skip the non-overlapping sites or treat the genotype of missing sites as 0|0?
Thank you for the help. I am grateful for the clear manual and valuable MSMC2 software when doing demography-related analysis.