lh3 / dipcall

Reference-based variant calling pipeline for a pair of phased haplotype assemblies
MIT License
94 stars 9 forks source link

Running dipcall for partially-phased assemblies? #9

Closed elcortegano closed 2 years ago

elcortegano commented 2 years ago

Hi, I have a pair of partially-phased assemblies for a sample, generated with minimap2, and wonder if these can be used as input for dipcall.

I have run this program with these two assembly files, assuming randomly one as paternal and the other as maternal (since they are not really phased). Wonder if this is the reason why the resultant VCF file is empty.

run-dipcall  prefix  reference.fa   assembly.bp.hap1.c_tg.fa assembly.bp.hap2.c_tg.fa  > prefix.mak
make -j2 -f prefix.mak

I am assuming here that the samples is females as well, but it is not. Not sure if that could affect as well. What is the PAR.bed input file for males? cannot fin this in the documentation.

Thanks

elcortegano commented 2 years ago

Apparently, a prefix.pair.vcf.gz file is generated with several calls, which us not mentioned in the documentation README (only prefix.dip.vcf.gz). What is this file? is also a valid final VCF output? how is it different from that prefix.dip.vcf.gz should be?

lh3 commented 2 years ago

dipcall works with partially phased assemblies. It is just that the phasing is the output is not correct. You need to know PAR.bed for your reference genome. If not, treat all samples as females. You won't be able to make correct calls in pseudoautosomal regions, though. The final prefix.dip.vcf.gz is generated from prefix.pair.vcf.gz. Most of time you should look at dip.vcf.gz.