sanger-tol / genomeassembly

Implementation of ToL genome assembly workflows
https://pipelines.tol.sanger.ac.uk/genomeassembly
MIT License
17 stars 2 forks source link

purge_dups and hifiasm? #35

Open nikostr opened 5 months ago

nikostr commented 5 months ago

Description of the bug

Hifiasm has built-in purging of haplotigs and seem to claim that purge_dups is too aggressive in purging (https://github.com/chhylp123/hifiasm/issues/162). Have you done comparisons of the purging done by hifiasm and purge_dups? Would it make sense to allow users to disable the purge_dups purging or allow users to set the hifiasm purging parameters?

Command used and terminal output

No response

Relevant files

No response

System information

No response

ksenia-krasheninnikova commented 5 months ago

Hi @nikostr,

In the datasets with a fair level of heterozygosity hifiasm primary assembly contains a noticeable amount of the retained haplotype. In these cases purge_dups has been showing a stable performance with balancing pri and alt assemblies. However in the cases with low heterozygosity it can be possible that the hifiasm purging is sufficient or even no purging is required at all. The workflow in the latest release is the basic implementation of Sanger Tree of Life assembly pipeline. We've been working on covering different use cases in the future releases.

nikostr commented 5 months ago

Thank you! Is this true even for hifiasm's more aggressive purging settings? Or is the upside of purge_dups that it adapts how aggressive the purging is to the genome without needing to set this manually?

And that sounds super reasonable! Looking forward to seeing how this progresses! :)

ksenia-krasheninnikova commented 5 months ago

hifiasm uses a graph based approach for purging, while purge dups does read mappings and one-to-one contigs alignment. For our assemblies we've got best results when they run in combination. We have experienced that with hifiasm more aggressive purging settings there is a risk of over-purging. These are our best practices so far. But it's fine to have a look at every case and adapt purging strategy.