mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.92k stars 447 forks source link

[BUG] Unexpected model output. #1439

Closed decadance-dance closed 10 months ago

decadance-dance commented 10 months ago

Bug description

First of all, good job guys. Keep it up.

My issue is I tried the dbnet + master / sar_resnet31 combination and got very strange results from text recognizers. It looks like a meaningless set of characters. I got a warning during inference: WARNING:root:Invalid model URL, using default initialization.

Code snippet to reproduce the bug

from doctr.models import ocr_predictor
from doctr.io import DocumentFile
from doctr.utils.visualization import visualize_page
import matplotlib.pyplot as plt

input = DocumentFile.from_images("./four.jpeg")
model = ocr_predictor('db_resnet50', 'master', pretrained=True)

result = model(input)
print(result)

Input image: four

Error traceback

WARNING:root:Invalid model URL, using default initialization.
Document(
  (pages): [Page(
    dimensions=(1024, 768)
    (blocks): [
      Block(
        (lines): [Line(
          (words): [
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°2&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°2&L`g3ÏÜSMCwK{°2&L`g3ÏÜSMCwK&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°2&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°2&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{&', confidence=0.024),
            Word(value='ô]CwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&L`g3ÏÜSMCwK{°&', confidence=0.024),
          ]
        )]
        (artefacts): []
      ),
  ...

Environment

Collecting environment information...

DocTR version: v0.7.0 TensorFlow version: N/A PyTorch version: 1.12.1+cu113 (torchvision 0.13.1+cu113) OpenCV version: 4.9.0 OS: Ubuntu 20.04.4 LTS Python version: 3.10.13 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA A30 Nvidia driver version: 465.19.01 cuDNN version: Probably one of the following: /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8

Deep Learning backend

is_tf_available: False is_torch_available: True

felixdittrich92 commented 10 months ago

Hey @decadance-dance 👋,

The pretrained pytorch models for vitstr/parseq/master/sar/linknet are only available on the main branch (v0.8.0a) currently and will be published with the next release soon :) If you want to test it now you would need to Install from the main branch :)

Best regards, Felix

decadance-dance commented 10 months ago

@felixdittrich92, I wasn't paying attention to this. Thanks.