nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
389 stars 120 forks source link

RNA004 Does not output any CTC data #379

Open VBHarrisN opened 7 months ago

VBHarrisN commented 7 months ago

Hello!

I am working on training a RNA specific basecaller model. To that end, I have been attempting to use the RNA004 basecaller for training. However, this model does not seem to be outputting the CTC data correctly. No matter what data I put in, the resulting chunks.npy is always 0 by 10000. To make sure it was not my data, I fed the RNA data through the DNA r_10 basecalling model and got a 59000 by 9996 numpy array. Furthermore, all outputs from the RNA004 basecaller model are sub 1 kb of storage, which I believe are just empty files. In addition, the model even says "saving CTC data" in the console (just to prove that the data isn't the problem) when using the RNA004 model. I believe this is a bug, as the RNA004 model does not throw any errors, it just does not save any data correctly. I am very confused as to how to proceed, as I need the RNA CTC data to train my specific basecalling model.

Let me know if I can provide any more information to help diagnose/solve this problem!

iiSeymour commented 5 months ago

Only high quality chunks (>99% accuracy by default) are saved for training. You will want to change this filter with --min-accuracy-save-ctc to be in line with the distribution on your RNA calls.

https://github.com/nanoporetech/bonito/blob/master/bonito/cli/basecaller.py#L211

VBHarrisN commented 5 months ago

I had read about this issue in other github issues. We tried setting the --min-accuracy-save-ctc flag to 15, 1, and 0.2. No data was every written to chunks.npy. Our data, in terms of quality typically has an average quality score of 14. I don't totally understand how you judge what is a high quality chunk or not.

Sgreenfield9 commented 5 months ago

We're in the same boat. I've actually dropped my --min-accuracy-save-ctc flag down to 0 but still nothing.