nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
395 stars 121 forks source link

bonito mod-calling with remora model #278

Open hd2326 opened 2 years ago

hd2326 commented 2 years ago

Greetings!

I have a trained onnx remora model, and I am wondering whether would it be possible to convert it to the tar+toml format for bonito mod-calling. Thank you very much in advance for your help!

iiSeymour commented 2 years ago

Hello @hd2326 you can give bonito your custom trained remora onnx model with bonito basecaller --modified-base-model custom.onnx (it doesn't need converting).

hd2326 commented 2 years ago

@iiSeymour Thank you very much for the quick reply!

So the mod-calling will be a two-step process, 1) using bonito tar+toml for basealling, based on which 2) using remora onnx to make mod-calling, right?

iiSeymour commented 2 years ago

You need both models (a bonito basecalling model [tar+toml] and a remora modbase [onnx] model) but it's one command -

bonito basecaller dna_r10.4.1_e8.2_hac@v3.5.1 /data/reads --modified-base-model custom.onnx > calls.bam
hd2326 commented 2 years ago

Got it! Thank you so much for the explanation!

hd2326 commented 2 years ago

Greetings!

As I am running bonito as @iiSeymour suggested:

bonito basecaller $bonito_models/dna_r9.4.1_e8_hac@v3.3/ $rawdata --modified-base-model $remora_model/model_best.onnx --modified-bases $mod --reference $genome

I got the following error:

remora.RemoraError: No trained Remora models for /bonito_models/dna_r9.4.1_e8. Options: dna_r9.4.1_e8, dna_r9.4.1_e8.1, dna_r10.4_e8.1

It seems that the remora model I provided cannot be recognized. Any insights on the issue? Thank you very much!

marcus1487 commented 2 years ago

The --modified-bases argument triggers bonito to lookup the corresponding remora model. But it appears that you have also specified the path to a remora model with the --modified-base-model argument which specified the modified bases to call. Removing the --modified-bases argument from the call should work.

hd2326 commented 2 years ago

Awesome! As @marcus1487 suggested, removing --modified-bases solves the problem!

But another problem came...

It seems that the bonito bam files are not compatible with samtools mpileup for modification analysis. I got the samtools mpileup: error reading from input file error, but I don't have this problem for guppy bam files.

Specifically, the modification I am trying to analysis is uracil (I named it x and 5xT) in DNA, and I trained the bonito model using the following workflow:

  1. I ran taiyaki prepare_mapped_reads.py with --mod x T 5xT to generate the hdf5 file.
  2. I ran remora dataset prepare with --motif T 0 to convert the hdf5 file to the npz file.
  3. I ran remora model train with the provided ConvLSTM_w_ref.py model template.

The MM tag I got in bonito bam files is like MM:Z:['T']+x,-1,...,-1;. As for guppy bam files I got something like Mm:Z:C+m,0,...,0;. Not sure what does the negative MM value mean, and maybe that causes the incompatible problem? Any insights on the issue? Thank you very much!

najohink commented 2 years ago

Hi @hd2326,

I was wondering if you ended up getting bonito to basecall Us?

best, S

hd2326 commented 2 years ago

@najohink Actually no. Still the same error...