marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

splitHaplotype #2344

Closed DayTimeMouse closed 1 month ago

DayTimeMouse commented 1 month ago

Hi,

Thanks for developing this tool.

I'd like to ask splitHaplotype is how to classify reads into different haplotypes. My run script is: canu-2.2/bin/splitHaplotype -cl 1000 -memory 48 -threads 24 -H hap1.only.meryl 1 hap1_hifi.fa.gz -H hap2.only.meryl 1 hap2_hifi.fa.gz -A ambiguous.fa.gz -R hifi.fq.gz

hap1.only.meryl and hap2.only.meryl are haplotype-specific kmer sets. If hap1.fa and hap2.fa are not real parental genomes, splitHaplotype is still recommend to classify reads into different haplotypes?

Thanks in advance.

skoren commented 1 month ago

Split haplotype doesn't care where the markers come from, it will just split reads into two bins, putting each into the haplotype with more markers. We've done binning based on non-parent but same species with species crosses and also with Hi-C or StrandSeq binned HiFi reads though not typically with an assembly. The assembly tends to be noisier since each k-mer is represented once so if the assembly mis-bins something it guarantees the corresponding reads would be mis-binned. So really the question is how confident are you in the markers which weren't generated from a trio.