popgenmethods / SINGER

Sampling and inference of genealogies with recombination
MIT License
19 stars 4 forks source link

Extra samples silently appended to haploid VCFs #2

Open nspope opened 5 months ago

nspope commented 5 months ago

It looks like haploid VCFs are accepted, but that "extra" haplotypes are appended to make the samples diploid:

printf '                                                                                                                                                                                       
import msprime                                                                                                                                                                                 
from sys import stdout                                                                                                                                                                         
ts = msprime.sim_ancestry(                                                                                                                                                                     
  samples=10, ploidy=1, population_size=1e4, recombination_rate=1e-8, sequence_length=1e6,                                                                                                     
)                                                                                                                                                                                              
ts = msprime.sim_mutations(ts, 1e-8)                                                                                                                                                           
ts.write_vcf(stdout)                                                                                                                                                                           
' | python >hap.vcf                                                                                                                                                                            

$SINGER/releases/singer-0.1.6-beta-linux-x86_64/singer_master \                                                                                                                           
  -Ne 1e4 -m 1e-8 -n 1 -thin 1 -polar 0.0 -output hap -vcf hap -start 0 -end 1e6                                                                                                               

sort hap_nodes_0.txt | uniq -c | head
#     20 0  <--- should be 10 nodes at time 0?                                                                                                                                                                          
#      1 1003.8213071229687                                                                                                                                                                     
#      1 1007.864086180477                                                                                                                                                                      
#      1 10094.114331480905                                                                                                                                                                     
# <snip>  

Guessing that's an issue with VCF parsing and not intended behavior?

YunDeng98 commented 5 months ago

Hi @nspope, SINGER currently only supports diploid vcf files, so it won't parse haploid vcf files correctly. I will add a remark in the documentation, thanks for pointing this out!