nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
394 stars 121 forks source link

Help train #241

Open simonbrd opened 2 years ago

simonbrd commented 2 years ago

Hello, I have a problem regarding using bonito to build my own methylation detection model on microalgae data. I do the following workflow :

$ git clone https://github.com/nanoporetech/bonito.git  # or fork first and clone that
$ cd bonito
$ python3 -m venv venv3
$ source venv3/bin/activate
(venv3) $ pip install --upgrade pip
(venv3) $ pip install -r requirements.txt
(venv3) $ python setup.py develop

bonito basecaller dna_r10.4_e8.1_fast@v3.4 --reference ../PrymneGenomeV1.fasta ../fast5/ --cevice cuda:0 > basecalls.bam

image

thank you in advance For info here is my configuration : image

iiSeymour commented 2 years ago

@simonbrd you have set --device cpu not the GPU in your screenshot.

simonbrd commented 2 years ago

Yes it's true sorry... But I still have an error with the GPUs

bonito cuda

iiSeymour commented 2 years ago

The default batch size for the fast model is 1536 and uses about ~10GB of GPU memory so try reducing the batchsize to fit your GPU memory capacity bonito basecaller dna_r10.4_e8.1_fast@v3.4 --batchsize 512 ....

simonbrd commented 2 years ago

thank you very much it works !

simonbrd commented 2 years ago

Hello, I have a new problem can you help me? train_bonito

for info i had this before : bonito basecaller dna_r10.4_e8.1_fast@v3.4 --batchsize 200 --reference ../PrymneGenomeV1.fasta ../fast5/ --device cuda:0 > data/train/basecalls_sans_ctc.bam

iiSeymour commented 2 years ago

@simonbrd to save data in a training format you need to add --save-ctc when basecalling, the error message you are getting is because not training data is present in data/train.

simonbrd commented 2 years ago

Ok thank you. But I don't understand because in the basecaller results I only have 2 files but no training data for the bonito train ? bonito res restrain