lh3 / dipcall

Reference-based variant calling pipeline for a pair of phased haplotype assemblies
MIT License
98 stars 10 forks source link

Can dipcall extend to the polyploidy plant genome ? #6

Open baozg opened 3 years ago

baozg commented 3 years ago

Hi, @lh3

Dipcall is designed for the diploidy, can I use the rule in the makefile for handling a plant tetraploidy genome?


# mapping
for hap in `seq 1 4`
do
    dipcall/dipcall.kit/minimap2 -c --paf-no-hit -xasm20 --cs -t16 DM4.chr.fasta query.${hap}.fa 2> query.${hap}.paf.gz.log | gzip > query.${hap}.paf.gz
    dipcall/dipcall.kit/minimap2 -a -xasm20 --cs -t16 ref.fasta query.${hap}.fa 2> query.${hap}.sam.gz.log | gzip > query.${hap}.sam.gz
    gzip -dc query.${hap}.paf.gz | sort -k6,6 -k8,8n | dipcall/dipcall.kit/k8 dipcall/dipcall.kit/paftools.js call - 2> ${hap}.${hap}.var.gz.vst | gzip > query.${hap}.
var.gz
    dipcall/dipcall.kit/k8 dipcall/dipcall.kit/dipcall-aux.js samflt query.${hap}.sam.gz | dipcall/dipcall.kit/samtools sort -m4G --threads 4 -o query.${hap}.bam -
    gzip -dc query.${hap}.var.gz | grep ^R | cut -f2- > query.${hap}.bed
done

# joint calling
dipcall/dipcall.kit/bedtk isec -m query.hap1.bed query.hap2.bed query.hap3.bed query.hap4.bed > query.dip.bed
dipcall/dipcall.kit/htsbox pileup -q5 -evcf ref.fasta query.hap1.bam query.hap2.bam query.hap3.bam query.hap4.bam | dipcall/dipcall.kit/htsbox bgzip > query.pair.vcf.gz

I do get a vcf with four haplotype variation. Would you mind give some advice for this usage? Is it invalid in any step?

image