Open psur9757 opened 3 years ago
You should be able to use bcftools consensus (http://samtools.github.io/bcftools/bcftools.html#consensus) to generate fasta files for each haplotype. The output vcf file has an identifier for each phased variant specifiying which block it belongs to.
@vibansal I think I am explaining it wrong. Since each contig is processed in parallel. How does HapCut2 know which blocks within a contig belong together?
Lets say the draft assembly has 4 contigs representing two copies of a chromosome. Since HapCut2 analysed each contig in parallel, how does it know which blocks (of a contig) belong together? How does it provide a recipe to create the two copies correctly, especially in terms of ordering of blocks in the chromosome? I understand the phasing bit, I think.
Sorry for the confusing question.
Hapcut2 is designed to reconstruct haplotypes for a diploid genome using reads mapped to a haploid consensus. For each group of variants that can be linked together by the reads, it outputs two haplotype sequences at heterozygous variant sites. I don't understand your objective completely but I don't think that HapCUT2 is designed to do that.
You should be able to use bcftools consensus (http://samtools.github.io/bcftools/bcftools.html#consensus) to generate fasta files for each haplotype. The output vcf file has an identifier for each phased variant specifiying which block it belongs to.
Hi, I tried to get consensus sequence with vcf and noticed that some SNP with allele type 1/2 in phased blocks are converted to 0 or 1 in fasta, it seems 2 is not included in .vcf ? Thank you.
Thank you for reporting this, we will fix this soon.
Input: PacBio long read, HiC and Illumina short read data Assembly: Canu v2.1.1 and then run my assembly through purgeHaplotigs Variants: FreeBayes
I process HiC and PacBio files as recommended in HiC_longread recipe. My question is what to do next to get a phased haplotype FASTA file? How do I know which blocks belong together?
Thank you.