mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.62k stars 421 forks source link

TypeError: Expected Ptr<cv::UMat> for argument 'array' when using read_pdf() #134

Closed jonathanMindee closed 3 years ago

jonathanMindee commented 3 years ago

Can't execute read_pdf() function.

See the pdf file sent over slack to reproduce.

python 3.6.9

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/main.py", line 17, in <module>
    result = model([doc])
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/core.py", line 51, in __call__
    boxes = self.det_predictor(pages, **kwargs)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/core.py", line 140, in __call__
    out = [self.post_processor(batch) for batch in out]
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/core.py", line 140, in <listcomp>
    out = [self.post_processor(batch) for batch in out]
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 173, in __call__
    boxes = self.bitmap_to_boxes(pred=p_, bitmap=bitmap_)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 140, in bitmap_to_boxes
    _box = self.polygon_to_box(points)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 106, in polygon_to_box
    x, y, w, h = cv2.boundingRect(expanded_points)  # compute a 4-points box from expanded polygon
TypeError: Expected Ptr<cv::UMat> for argument 'array'
fg-mindee commented 3 years ago

@jonathanMindee Thanks for reporting this!

Would you mind specifying a minimal version of the script you used that raised this error please?

jonathanMindee commented 3 years ago

Sure, I used the sample code in the README.md

from doctr.documents import read_pdf, read_img
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
# PDF
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
fg-mindee commented 3 years ago

Strange, I'm not able to reproduce the error, even with the sample pdf. Could you run our diagnostic script and report back the result?

wget https://raw.githubusercontent.com/mindee/doctr/main/scripts/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
jonathanMindee commented 3 years ago

Collecting environment information...

DocTR version: 0.1.1a0 TensorFlow version: 2.4.1 OS: Ubuntu 18.04.5 LTS Python version: 3.6 Is CUDA available: No CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 2080 Ti Nvidia driver version: 455.45.01 cuDNN version: Probably one of the following: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5

fg-mindee commented 3 years ago

So my investigation led me to this conclusion: in very rare cases (occurring in one page of the document you used), the conversion of one segment (extracted from the segmentation from the text detection model) to a bounding box failed. The reason behind that is that the extrapolation of the segment to a polygon generated two sets of points rather than one. As this is used to be casted to a contiguous ndarray, if the two sets do not have the same number of points, it will generate an error.

I suspect that this is because the generated polygon overlaps also a small part of another segment. If so, keeping the set with the highest amount of points would solve the issue. I'll open up a PR to tackle this!