Open Phaired opened 3 weeks ago
This warning is quite normal in PyTorch codebases that haven't been updated very recently, as it was only added in PyTorch v2.4.0. Adding weights_only=True
to the torch.load
command should resolve it.
This warning shouldn't affect the accuracy of the output. I suspect something else is happening. Can you upload the model (in ONNX format) and a few examples of images it is trained to recognize?
Sorry but I'm talking about this warning :
/root/.cache/pypoetry/virtualenvs/ocrs-models-PAoYRsOs-py3.10/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:4545: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with GRU can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
warnings.warn(
The training was good : text-recognition.onnx.zip
Epoch 51 validation loss 0.006593015574736102 char error rate 0.0011795811587944627
Here are some image extracted from the training (it's a synthetic dataset so nothing more to see)
Did you modify the alphabet used for classification (DEFAULT_ALPHABET
)? If so can you post the alphabet you used. I notice that this model has a last output dimension of size 69 rather than 97. The ocrs library hard-codes the alphabet to match the models (see here). The library and ocrs CLI tool ought to have a setting so you can specify the alphabet, or there should be a way to embed it in the model, but that's currently missing.
Oh yes I did
DEFAULT_ALPHABET = (
" 0123456789"
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà-%?"
)
so how should I change it ?
I tested this pull request that I made using the same alphabet I used for training, but I'm still getting incorrect characters during OCR. Am I missing something? or could there be an issue with the exported model, even though the training seemed to go well?
let engine = OcrEngine::new(OcrEngineParams {
detection_model: Some(detection_model),
recognition_model: Some(recognition_model),
alphabet: Some(" 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà-%?".to_string()),
..Default::default()
})?;
I tested https://github.com/robertknight/ocrs/pull/100 that I made using the same alphabet I used for training, but I'm still getting incorrect characters during OCR.
There might be an issue with the inputs being slightly different when using the ocrs library than what the model saw in training. You can export these inputs using ocrs --text-line-images {image.png}
. This will create a folder called lines
that contains the input images to the recognition step. You can then try using these images with the Python code to find out if the problem is with differences in the input, or whether there is a problem with the exported model. The default models are naturally robust to input variation because they were trained on a wide variety of images. I have seen issues when training on highly homogenous synthetic data where the models can be overly sensitive to unimportant details (eg. borders around the image).
I fine-tuned both of my models using additional images and made some modifications like adjusting font size, text offset, and tilt. While the text-detection model seems to work well, the recognition model isn't performing as expected. Below are the ONNX models and the image I tested them on.
remybarranco@MacBook-Pro-de-Remy examples % cargo run -p ocrs-cli -r -- l.png --detect-model text-detection.rten
Finished `release` profile [optimized] target(s) in 0.03s
Running `/Users/remybarranco/Developer/ocrs-fork/target/release/ocrs l.png --detect-model text-detection.rten`
Alphabet: 0123456789?%ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà
251
21
41
7
7
7
7
401
7
7
remybarranco@MacBook-Pro-de-Remy examples % cargo run -p ocrs-cli -r -- l.png --detect-model text-detection.rten --rec-model text-recognition.rten
Finished `release` profile [optimized] target(s) in 0.03s
Running `/Users/remybarranco/Developer/ocrs-fork/target/release/ocrs l.png --detect-model text-detection.rten --rec-model text-recognition.rten`
Alphabet: 0123456789?%ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà
0
Hey, I'm getting this warning when exporting the text recognition, and I think it's causing the export to not work correctly. It seems like it's not exporting the last batch or something similar because the results are very poor when using it with your OCR library.