Open CDieterich opened 4 years ago
Hi @CDieterich
I have not trained an RNA model yet, I will update this issue if things change.
Regards
Excellent.
BTW, is there any manual to do the training myself ?
Id be also interested in any documentation of using bonito train. Is it similar process to taiyaki? From what I understood from the Nanopore Community meeting when Clive gave a talk, the structure was simpler?
I think it should be straightforward, if not, let me know.
First, make sure you have the training data downloaded.
$ bonito download --training
Then run bonito train and give it an output directory.
$ bonito train model-train-dir
[loading data]
[loading model]
[990000/990000]: 100%|#########################################| [1:23:46, loss=0.2546]
[epoch 1] directory=model-train-dir loss=0.2496 mean_acc=92.351% median_acc=93.035%
[990000/990000]: 100%|#########################################| [1:23:40, loss=0.2010]
[epoch 2] directory=model-train-dir loss=0.2201 mean_acc=93.310% median_acc=94.000%
[990000/990000]: 100%|#########################################| [1:23:41, loss=0.2255]
[epoch 3] directory=model-train-dir loss=0.2038 mean_acc=93.847% median_acc=94.527%
[990000/990000]: 100%|#########################################| [1:23:40, loss=0.2018]
[epoch 4] directory=model-train-dir loss=0.1964 mean_acc=94.090% median_acc=94.608%
[990000/990000]: 100%|#########################################| [1:23:32, loss=0.2001]
[epoch 5] directory=model-train-dir loss=0.1899 mean_acc=94.318% median_acc=95.025%
[990000/990000]: 100%|#########################################| [1:23:32, loss=0.1862]
[epoch 6] directory=model-train-dir loss=0.1871 mean_acc=94.383% median_acc=95.025%
[990000/990000]: 100%|#########################################| [1:23:31, loss=0.1678]
[epoch 7] directory=model-train-dir loss=0.1813 mean_acc=94.583% median_acc=95.098%
[990000/990000]: 100%|#########################################| [1:23:41, loss=0.1916]
[epoch 8] directory=model-train-dir loss=0.1793 mean_acc=94.634% median_acc=95.396%
[990000/990000]: 100%|#########################################| [1:23:34, loss=0.1865]
[epoch 9] directory=model-train-dir loss=0.1764 mean_acc=94.755% median_acc=95.500%
[990000/990000]: 100%|#########################################| [1:23:32, loss=0.1565]
[epoch 10] directory=model-train-dir loss=0.1763 mean_acc=94.737% median_acc=95.500%
[990000/990000]: 100%|#########################################| [1:23:32, loss=0.1580]
[epoch 11] directory=model-train-dir loss=0.1739 mean_acc=94.836% median_acc=95.522%
[125184/990000]: 13%|####### | [10:35, loss=0.1572]
By default, the training will use 1 million chunks with a 1% validation split. You can see the progress of each epoch over the 990,000 training examples with the training loss updating for each batch. At the end of each batch, you get the validation loss and accuracy reported.
I am just wondering. Does 'bonito download --training' command can downloaded all the training data that needed to train a satisfying model? Many thanks!
@snower2010 bonito download --training
will give you the full training set for dna_r9.4.1
that is used to train the model shipped with bonito. I'm currently only focusing on a single condition.
HTH,
Chris.
Got it! Many thanks! By the way, could you also tell me the configuration and the time cost for this specific traning. Thanks!
Ok, got back to this now..
any developments on this aspect (RNA modifications) @iiSeymour ?
I would be happy to do it myself provided that there is some documentation for training from scratch / or pretrained ?
Thank you
Continuing this thread - would it be worth it if a few of us here put our heads together and attempted training a [direct] RNA model? I know there are boatloads of direct and cDNA RNA for the human NA12878 runs (https://github.com/nanopore-wgs-consortium/NA12878/tree/master/nanopore-human-transcriptome).
Dear developers,
would you be able to provide an RNA model for bonito somewhere
bonito basecaller rna_r9.4.1 /data/reads > basecalls.fasta
Thank you Christoph