haotianteng / Chiron

A basecaller for Oxford Nanopore Technologies' sequencers
Other
122 stars 53 forks source link

Bad Results for Basecalling after Training own Model #116

Closed goevea closed 9 months ago

goevea commented 2 years ago

Hi! Describe the bug After Training my own Model with about 5000 reads I get the following results when I use the model for Basecalling: None of the resulting FASTA files gives a useful result.

image

To Reproduce I am using the data from https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1727-y#Bib1. Taking about 5000 raw fast5 reads for training. To prepare them for training, I am using tombo preprocess annotate_raw_with_fastqs first with the fastq file, created by guppy and then tombo resquiggle with the reference genome for labeling. Then I am running chiron export and chiron train with the following command:

nohup chiron train -i data/Klebsiella_pneumoniae_INF032_fast5s_chiron/test -o model -m DNA --retrain

DNA is the DNA_default model copied, but I also tried training a completely new model with the same results. This runs with the following message: Model model/DNA saved.

Then I am running the basecall: nohup chiron call -i /mnt/data2/bmestu/goessv/data/Klebsiella_pneumoniae_INF032_fast5s_chiron -o /mnt/data2/bmestu/goessv/chiron_output_train -e fasta -m model/DNA --batch_size 1000

I have tried it already with several number of reads (100, 1000, 5000) and see no improvement. I checked, and Tombo takes the correct Bases for labelling, and when I check the raw folder in results, it reads the correct ones, that I also see in the Fast5 files.

Am I making an obvious mistake? Or have you ever seen something like this? If you need more Information, do not hesitate to ask.

Thanks in Advance! Veronika Environment (please complete the following information):