Code for the paper Causalcall: nanopore basecalling using a temporal convolutional network.
Extract training and validation data (.tfrecords) from resquiggled fast5 files:
python raw.py -i training/validation_fast5_files_folder -o tfrecords_files_folder -f train.tfrecords/validate.tfrecords
Train the model with default parameters:
CUDA_VISIBLE_DEVICES=cuda_id python train.py -i tfrecords_files_folder -o model_folder -m model_name
Make sure that train.tfrecords and validate.tfrecords are both in the tfrecords_directory.
If you want to change parameters, use train.py -h
for more details.
CUDA_VISIBLE_DEVICES=cuda_id python basecall.py -i fast5_files_folder -o results_folder -m path_to_model
Path to default model: ./model/DNAmodel/
We thank Chiron for providing the source code. Causalcall is developed on the basic framework of Chiron's code. (The parts of preprocessing input data and converting outputs of the TCN-based model to base sequences are revised based on Chiron's code following MPL 2.0.)