Assembly of complex genomes with high rate of genome rearrangements

Alteroldis commented 6 months ago

Dear Dr Kolmogorov, I work with very strange genome of sea cucumber and could not get a mosaic haploid genome assembly. I tried some options (meta, keep-haplotypes, no-alt-contigs, tweaks from ONT recommendation for v14 chemistry), but at each variant I got excellent haplotype resolved assembly). And I can't divide obtained contigs to haplotypes of tested tools (purge_dups, kmer_dedup, purge_haplotypes). As result I only lost about 30% of genes, based on BUSCO report. Can you advice me settings, which I should check? There are some info about species: Diploid species; Genome size about 1.4-1.6Gb (assembled 2.8-3.2Gb); v14 ONT chemistry; About 40x coverage (based on 1.4Gb genome size); Reads N50 14100bp.

mikolmogorov commented 6 months ago

Hello,

Sorry for my late response! Looks like the opposite problem from what most users want.. If purge_haplotigs did not collapse the assembly, the haplotypes should be quite different - and at this point purge haplotigs thinks that this is one big haploid assembly.

You can try to experiment with increasing --read-error to higher values (e.g. 5-10%) or trying --nano-raw instead of --nano-hq. But this may not help. The right way to approach this is probably to self-align the assembly, identify pairs of homologous contigs and separate them into two bins. But this may not be trivial either since it may not be a 1:1 relationship.

mikolmogorov commented 4 months ago

Assuming that this has been answered. Feel free to follow up if you have more questions!

mikolmogorov / Flye

Assembly of complex genomes with high rate of genome rearrangements #686