Shasta long read assembler

De novo assembler for long reads, optimized for Oxford Nanopore (ONT) reads.

🆕 Mode 3 assembly: presentation of assembly results

Shasta development continues in this fork.

New releases will appear in the Releases page of this repository. Previous releases (up to 0.10.0) are available from the Release page of pre-fork repository chanzuckerberg/shasta.

The complete user documentation is available here.

For quick start information see here.

The main paper describing Shasta and its methods and results is Shafin et al., Nature Biotechnology 2020. Reads from this paper are available here. The assembly results are here.

Requests for help: please file GitHub issues to report problems, request help, or ask questions. Please keep each issue on a single topic when possible.

Main features of the Shasta long read assembler:

Optimized to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input.
High performance (a few hours for a human assembly using a single machine of appropriate size).
Haploid or phased diploid assembly.

Computational methods used by the Shasta assembler include:

Using a run-length representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.
Most phases of the computation use a representation of the read sequence based on markers, a fixed subset of short k-mers (k ≈ 10).

See this documentation page for more information on computational methods.

Acknowledgments

The Shasta software uses various external software packages. See here for more information.

The complete user documentation is available here.

For quick start information see here.

paoloshasta / shasta

readme

Shasta long read assembler

Acknowledgments