robertknight / ocrs-models

PyTorch models for the ocrs OCR engine
39 stars 7 forks source link

warning when trying to export recognition to onnx #31

Open Phaired opened 3 weeks ago

Phaired commented 3 weeks ago

Hey, I'm getting this warning when exporting the text recognition, and I think it's causing the export to not work correctly. It seems like it's not exporting the last batch or something similar because the results are very poor when using it with your OCR library.

poetry run python -m ocrs_models.train_rec hiertext datasets/hiertext/ \
  --checkpoint text-rec-checkpoint.pt \
  --export text-recognition.onnx
Model param count 2412549
/workspace/ocrs-models/ocrs_models/train_detection.py:212: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(filename, map_location=device)
/root/.cache/pypoetry/virtualenvs/ocrs-models-PAoYRsOs-py3.10/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:4545: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with GRU can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
  warnings.warn(
robertknight commented 3 weeks ago

This warning is quite normal in PyTorch codebases that haven't been updated very recently, as it was only added in PyTorch v2.4.0. Adding weights_only=True to the torch.load command should resolve it.

This warning shouldn't affect the accuracy of the output. I suspect something else is happening. Can you upload the model (in ONNX format) and a few examples of images it is trained to recognize?

Phaired commented 3 weeks ago

Sorry but I'm talking about this warning :

/root/.cache/pypoetry/virtualenvs/ocrs-models-PAoYRsOs-py3.10/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:4545: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with GRU can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
  warnings.warn(

The training was good : text-recognition.onnx.zip

Epoch 51 validation loss 0.006593015574736102 char error rate 0.0011795811587944627

Here are some image extracted from the training (it's a synthetic dataset so nothing more to see) 195_229_477_253 125_230_151_248

robertknight commented 3 weeks ago

Did you modify the alphabet used for classification (DEFAULT_ALPHABET)? If so can you post the alphabet you used. I notice that this model has a last output dimension of size 69 rather than 97. The ocrs library hard-codes the alphabet to match the models (see here). The library and ocrs CLI tool ought to have a setting so you can specify the alphabet, or there should be a way to embed it in the model, but that's currently missing.

Phaired commented 3 weeks ago

Oh yes I did

DEFAULT_ALPHABET = (
    " 0123456789"
    + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà-%?"
)

so how should I change it ?

Phaired commented 2 weeks ago

I tested this pull request that I made using the same alphabet I used for training, but I'm still getting incorrect characters during OCR. Am I missing something? or could there be an issue with the exported model, even though the training seemed to go well?

let engine = OcrEngine::new(OcrEngineParams {
    detection_model: Some(detection_model),
    recognition_model: Some(recognition_model),
    alphabet: Some(" 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà-%?".to_string()),
    ..Default::default()
})?;
robertknight commented 2 weeks ago

I tested https://github.com/robertknight/ocrs/pull/100 that I made using the same alphabet I used for training, but I'm still getting incorrect characters during OCR.

There might be an issue with the inputs being slightly different when using the ocrs library than what the model saw in training. You can export these inputs using ocrs --text-line-images {image.png}. This will create a folder called lines that contains the input images to the recognition step. You can then try using these images with the Python code to find out if the problem is with differences in the input, or whether there is a problem with the exported model. The default models are naturally robust to input variation because they were trained on a wide variety of images. I have seen issues when training on highly homogenous synthetic data where the models can be overly sensitive to unimportant details (eg. borders around the image).

Phaired commented 2 weeks ago

I fine-tuned both of my models using additional images and made some modifications like adjusting font size, text offset, and tilt. While the text-detection model seems to work well, the recognition model isn't performing as expected. Below are the ONNX models and the image I tested them on.

models.zip

l
remybarranco@MacBook-Pro-de-Remy examples % cargo run -p ocrs-cli -r -- l.png --detect-model text-detection.rten                                  
    Finished `release` profile [optimized] target(s) in 0.03s
     Running `/Users/remybarranco/Developer/ocrs-fork/target/release/ocrs l.png --detect-model text-detection.rten`
Alphabet:  0123456789?%ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà
251
21
41
7
7
7
7
401
7
7
remybarranco@MacBook-Pro-de-Remy examples % cargo run -p ocrs-cli -r -- l.png --detect-model text-detection.rten --rec-model text-recognition.rten
    Finished `release` profile [optimized] target(s) in 0.03s
     Running `/Users/remybarranco/Developer/ocrs-fork/target/release/ocrs l.png --detect-model text-detection.rten --rec-model text-recognition.rten`
Alphabet:  0123456789?%ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèà
0