vpc-ccg / haslr

A fast tool for hybrid genome assembly of long and short reads
GNU General Public License v3.0
74 stars 9 forks source link

HASLR on heterozygous genomes #1

Closed dcopetti closed 4 years ago

dcopetti commented 4 years ago

Hi,

I wonder if you have any idea of how your assembler will work with highly heterozygous genomes: do you think it will be able to recognize allelic long reads and maintain the phasing within a read? My plant genome is ~2.5 Gb, but assembling it I get almost a 4 Gb assembly - i.e. lots of sites are very different and don't collapse in a single contig per locus, rather one for each allele. Do you think it will be worth giving it a try, or will HASLR smash both alleles together anyway? Thanks,

Dario

haghshenas commented 4 years ago

Hi there,

Thanks for your interest in our assembler. The current version of HASLR is not heterozygosity-aware as it performs one consensus calling of all long read subsequences between two nodes of the backbone graph. However, it is possible to cluster long read subsequences into two (or more) groups and perform a separate consensus calling for each group. I am working on this line of improvement and will integrate it in HASLR in the future. So stay tuned :)

xiekunwhy commented 3 years ago

Hi,

Any progress about high heterozygous (>2%) genomes assembling using HASLR? And can you keep this issue opening until the heterozygous problem is solved?

Best, Kun