Closed lijl459 closed 3 months ago
In both cases, the nchrs
parameter should be set to 22. Because there should be two sets of haplotypes in allhaps.fa or p_utg.fa.
Additionally, what is the difference between this approach and directly using
p_utg.fa
?
This question is complicated. You first need to understand the principle of how hifiasm uses Hi-C data to phase the assembly graph. Please refer to this paper of hifiasm. In short, the hap.p_ctg sequences are fully phased and should have a higher contig N50 than that of p_utg. Although hifiasm may have some problems in dealing with autopolyploid genomes. However, in the case of a diploid genome, hifiasm is expected to work well. Hence, I would recommend using the hap.p_ctg rather than the p_utg.
Please note that the scaffolding strategy could be different depending on the heterozygosity level. If the heterozygosity is high and there are still enough Hi-C reads after filtering with MAPQ>=1, you can use your first command to run HapHiC. If the heterozygosity is too low and there are not enough Hi-C reads after MAPQ>=1 filtering, it is suggested to align and scaffold the two sets of haplotypes separately.
Thanks very much!
Hi, zengxiaofei
If I use this command:
haphic pipeline allhaps.fa HiC.filtered.bam nchrs --gfa "hap1.p_ctg.gfa,hap2.p_ctg.gfa"
Is HiC.filtered.bam generated by allhap.fa(cat hap1.fa hap2.fa)?Like:
bwa mem -5SP -t 28 allhap.fa read1_fq.gz read2_fq.gz | samblaster | samtools view - -@ 14 -S -h -b -F 3340 -o HiC.bam
HapHiC/utils/filter_bam HiC.bam 1 --nm 3 --threads 14 | samtools view - -b -@ 14 -o HiC.filtered.bam
Best wishes.
@DayTimeMouse Yes.
“Please note that the scaffolding strategy could be different depending on the heterozygosity level. If the heterozygosity is high and there are still enough Hi-C reads after filtering with MAPQ>=1, you can use your first command to run HapHiC. If the heterozygosity is too low and there are not enough Hi-C reads after MAPQ>=1 filtering, it is suggested to align and scaffold the two sets of haplotypes separately.” @zengxiaofei Hi, Zeng. For the filtered BAM file, around how much left (like how much percent?) would be considered as enough and indicated high heterozygosity?
Hello,
I have a question regarding the use of haphic for genome assembly and processing results with hifiasm.
The genome I'm working with is diploid, and the haploid chromosome number is 11. In the command:
haphic pipeline allhaps.fa HiC.filtered.bam nchrs --gfa hap1.p_ctg.gfa,hap2.p_ctg.gfa
should I set
nchrs
to 11 or 22?Additionally, what is the difference between this approach and directly using
p_utg.fa
? Will I obtain haplotype-phased assemblies if I run:haphic pipeline p_utg.fa HiC.filtered.bam nchrs --gfa p_utg.gfa
?And in this case, should
nchrs
be set to 11 or 22?Thank you!