Katuali
is a flexible consensus pipeline implemented in
Snakemake to basecall, assemble, and polish
Oxford Nanopore Technologies' sequencing data.
Documentation can be found at https://nanoporetech.github.io/katuali/.
A Makefile is provided to create a virtual environment into which all necessary dependencies will be installed. Katuali has been tested on Linux (specifically Ubuntu 16) and MacOS.
To setup the environment run:
git clone https://github.com/nanoporetech/katuali.git
cd katuali
make install
source ./venv/bin/activate
A conda package is forthcoming.
Katuali makes use of a number of tools to do basecalling, assembly, and polishing that need to be installed to perform such steps of a pipeline. Katuali does not enforce that all possible tools are present; only those required to run a user's pipeline need be installed. If Katuali cannot find a tool, it will alert the user.
The list below indicates possible options for pipeline steps and the default filesystem location where Katuali expects to find them. These locations can be changed by the user, see the documentation for more information. Please refer to the documentation of each of these tools for installation instructions.
General Tools
Pomoxis is used as a general utility throughout much of katuali, its installation is required for most functionality.
Basecalling Tools
Guppy is recommended, both scrappie and flappie are research prototypes.
Assembly Tools
In the context of Katuali, pomoxis functions as a wrapper to miniasm. Canu is utilised in the standard workflow, while pomoxis is used in the fast workflow.
Consensus Tools
Medaka is recommended in combination with the latest "flip-flop" algorithm in the guppy basecaller, and for rapid consensus. Nanopolish may still be preferred if using older basecallers.
The Katuali
tests contain examples of how to basecall, assemble, and polish
a small dataset that comes bundled with Katuali
. These tests can be run with:
make test
from the Katuali
source directory. Running the tests requires installation of
scrappie
, pomoxis
, medaka
, and nanopolish
. See the
documentation
for more information on the outputs of running the above.
To run with other data, start by creating a directory of reads (which could contain subdirectories of reads) within a run directory:
mkdir -p run1
cd run1
ln -s /path/to/fast5 reads # create a softlink to the fast5 data
cd ..
Then make a copy of the katuali config into your working directory,
katuali_config my_config.yaml
and update the katuali config to reflect your data:
DATA:
'run1':
'GENOME_SIZE': '4.0M' # for canu we need to specify genome size
There are two predefined pipelines.
To perform basecalling, a quick assembly with miniasm, and consensus with racon and medaka it is sufficient to run:
katuali --configfile my_config.yaml fast_assm_polish
Alternatively to assemble with canu run:
katuali --configfile my_config.yaml standard_assm_polish
The documentation describes how to create a pipeline with any combination of basecallers, assemblers and polishers.
It is possible to train medaka models starting from folders of fast5s in a single command once the config has been modified to reflect your input data (fast5s and genomes for each run as well as training and evaluation region definitions).
Running:
katuali all_medaka_train_features --keep-going
will:
Running:
katuali medaka_train_replicates --keep-going
will do all the tasks of all_medaka_train_features
and additionally launch
multiple medaka model-training replicates.
If some of your input runs have insufficient coverage-depth for some of the
training regions, some of the training feature files will not be made. In this
case the config flag USE_ONLY_EXISTING_MEDAKA_FEAT
can be set to true to allow
katuali to train using only those features which exist already:
USE_ONLY_EXISTING_MEDAKA_FEAT: true
Refer to comments in the config (katuali/config.yaml) to see how this process can be controlled.
Licence and Copyright
© 2018 Oxford Nanopore Technologies Ltd.
katuali
is distributed under the terms of the Mozilla Public License 2.0.
Research Release
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.