mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
789 stars 167 forks source link

keep-haplotypes option is ploidy unaware? #622

Closed alexvasilikop closed 1 year ago

alexvasilikop commented 1 year ago

Hello,

I am interested to know whether the option --keep-haplotype makes any assumption concerning the ploidy (e.g., diploid or any other). I have a genome whose ploidy is unknown and I would like to use distribution of ploidies of alternative contigs produced by flye to estimate ploidy levels of the genome.

cheers Alex

mikolmogorov commented 1 year ago

Hi Alex,

Flye in "regular" mode assumes diploid genome, but you can run it using --meta --keep-haplotypes, which does not have ploidy assumptions. Note that --keep-haplotypes is good for keeping structural variations (of fairly large size, 500bp+), but will not reconstruct alleles that are different by a few SNPs/small variants, unless the sequence divergence is very high.

Hope this helps, Misha

alexvasilikop commented 1 year ago

Many thanks Misha and sorry for the delayed reply. Do you have any quantitative estimates or have you made any tests concerning how heterozygous the genomes have to be in order for Flye to accurately predict all different haplotypes?

My different genomes have an estimated heterozygosity of 1-4% (but this was estimated assuming that they are diploid using a sliding window approach -> % of heterozygous SNPs in chromosome windows of 80kb after mapping the reads to the haploid genome and calling SNPs with GATK).

Thanks again Alex

mikolmogorov commented 1 year ago

Hi Alex,

If you are interested in reconstructing complete haplotypes, the --keep-haplotypes won't do that, it is really for keeping large structural vaiants, but the small hets will likely remain collapsed.

For a diploid genome, you can try using Hapdup - but it does assume a diploid genome: https://github.com/KolmogorovLab/hapdup. Assuming that your data is ONT, I am not aware of any method that can assemble polyploid genome into (uncollapsed) haplotypes.