paoloshasta / shasta

De novo assembly from Oxford Nanopore reads.
https://paoloshasta.github.io/shasta/
Other
66 stars 9 forks source link

Haplotype partitioning using phased SNPs #16

Closed ekg closed 5 months ago

ekg commented 11 months ago

@danrdanny and I are thinking about how to do haplotype resolved assembly for the 1000G-ONT project.

In these genomes we already have highly accurate phased haplotypes over SNPs.

Can we use this information in Shasta to separate reads or assembly graph elements by haplotype?

A hack would be to separate the reads before assembly. There is probably a way to script this out for R&D.

paoloshasta commented 11 months ago

Shasta can do phased diploid assembly without having to assign reads to haplotypes. See here for more information and for a description of Shasta output created in a phased diploid assembly.

The algorithm does not assign reads to haplotypes. An earlier implementation that assigned reads to haplotypes proved to be fragile.

If you have R10 reads, you can run a phased diploid assembly using --config Nanopore-Phased-R10-Fast-Nov2022. There are also assembly configurations for phased assembly using R9 reads.

These assemblies usually achieve phased lengths of multiple Mb for non-UL reads and tens of Mb for UL reads. Phased lengths can also be increased using GFAse, a tool developed by @rlorigro that can be used to postprocess a Shasta phased diploid assembly.

Alternatively, if you already separated your reads by haplotype you could use regular Shasta haploid assemblies to assemble each haplotype separately.

benedictpaten commented 11 months ago

If this is R9, I'd definitely recommend looking at the NAPU pipeline, which builds upon Shasta (non-phased mode). That said, it would be cool to investigate using the R9 diploid shasta phasing model and comparing it to Napu. I suspect they will end up being fairly similar.

benedictpaten commented 11 months ago

I should clarify - NAPU works well for R9 and R10, but I think the new, unreleased modes of Shasta phasing will only work with R10, leaving you only with the released "mode 2" phasing for R9. I think mode 2 r9 phasing will end up similar to shasta+hapdup. hopefully that makes sense

danrdanny commented 11 months ago

Thanks all! The current workflow uses the CARD pipeline, so we may be hitting some corner cases where the phased assemblies are breaking in unexpected places. I can drop a couple of examples here if that is helpful.

paoloshasta commented 11 months ago

unreleased modes of Shasta phasing will only work with R10

@benedictpaten refers here to ongoing development work that will provide more general and powerful assembly methods for phased assembly in Shasta, and will only work with R10 reads.

In my first post, I was talking about the existing and released Shasta phased diploid assembly. That works with both R9 and R10 reads and can also be used in conjunction with GFAse.

@danrdanny the napu pipeline used in CARD uses a Shasta haploid assembly followed by hapdup, which turns the haploid assembly into a phased assembly. If you are having phasing problems in that pipeline after the Shasta assembly is complete, you should report those to napu developers.

paoloshasta commented 5 months ago

I am closing this due to lack of discussion. Feel free to create a new issue if additional discussion topics emerge.