mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.38k stars 406 forks source link

Flipped text recognition prediction. #1455

Open decadance-dance opened 5 months ago

decadance-dance commented 5 months ago

Bug description

When I set the option assume_straight_pages=False, some of the predictions may be turned upside down. I tried db_resnet34, db_resnet50 and master, parseg. For each pair I observed this bug.

Code snippet to reproduce the bug

from doctr.models import ocr_predictor
from doctr.io import DocumentFile

input = DocumentFile.from_images("./gh.png")

model = ocr_predictor(
    'db_resnet50', 
    'parseq', 
    pretrained=True,
    assume_straight_pages=False,
).cuda().half()

result = model(input)
print(result)

Error traceback

...
Line(
  (words): [
              Word(value='ster', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='6661]', confidence=0.95),         <-- Should be '[1999'
              Word(value='and', confidence=1.0),
              Word(value='2012],', confidence=1.0),
              Word(value='Gamba', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='[2018]', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Axelrod', confidence=1.0),
              Word(value='[2018).', confidence=0.99),
            ]
),
...

gh_mark

Environment

Collecting environment information...

DocTR version: 0.8.0a0 TensorFlow version: N/A PyTorch version: 2.1.0a0+4136153 (torchvision 0.16.0a0) OpenCV version: 4.9.0 OS: Ubuntu 22.04.2 LTS Python version: 3.10.6 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 12.1.105 GPU models and configuration: GPU 0: NVIDIA A30 Nvidia driver version: 525.147.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2

Deep Learning backend

is_tf_available: False is_torch_available: True

felixdittrich92 commented 5 months ago

Hi @decadance-dance :wave:

Yeah this depends on the crop orientation classifier which isn't 100% robust atm. We will retrain this after the next release a training script is already added :)

CC @odulcy-mindee

decadance-dance commented 5 months ago

@felixdittrich92, got you, thanks. BTW, maybe you know an easy way to workaround it in my case. My case is I want to get quads (4 pts) instead of rectangles (2 pts) as input of a detector, even if my page is straight. That is, in a real scenario, I will receive straight documents and I don’t really need to get their orientation and rotate them, but I still need rectification to feed crops to the recognizer.

felixdittrich92 commented 5 months ago

Mh could you explain this a bit more in detail ? Because if your images contains only straight text the rectification should not be a problem !?

If we talk about some modifications from the detector output in the middle of the pipeline before it's passed to the recognition model -> https://github.com/mindee/doctr/pull/1449 could be a helpful solver (Note: input and output signature needs to be the same so conversion from rect to quad in the same pipeline will not work

decadance-dance commented 5 months ago

@felixdittrich92 All my documents are straight. So I could use assume_straight_pages = True, but in that case I would get rectangles (two points) as the detector output. But I need to get quads (four points) from the detector, so I use assume_straight_pages = False. But this option sometimes causes problems, such as those described in this issue. So I'm looking for a way to get four points from detector and avoid the upside down crops.

nikhilanj commented 3 days ago

@felixdittrich92 Hi, I'm facing similar issues with v0.8.1 when operating on text that is rotated upto +/- 45 degrees. I see the issue mentions v0.9.0 and v0.10.0. Is there a way I can test the new model/checkpoint ? PR #1608 has a new TF checkpoint, but I'm using PyTorch