ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.42k stars 592 forks source link

ocropus-gpageseg is crashing with a numpy deprecation TypeError #299

Closed freen closed 5 years ago

freen commented 6 years ago

Expected Behavior

That ocropus-gpageseg splits this document image into rows.

Current Behavior

Instead, ocropus-gpageseg is crashing with a numpy deprecation TypeError.

The Error in question is raised in numpy here, as of numpy version v1.14.2, committed in November 2016: https://github.com/numpy/numpy/commit/c9adc35e68b92b10ab0b20069465fd784388bc14

root@0fcf71db448e:/app# ocropus-gpageseg -n -d --maxcolseps=0 --maxseps=0 /tmp/19320671_0.png
INFO:
INFO:  ########## /usr/local/bin/ocropus-gpageseg -n -d --maxcolseps=0 --maxse
INFO:
INFO:  /tmp/19320671_0.png
INFO:  scale 26.944387
INFO:  computing segmentation
INFO:  computing column separators
INFO:  considering at most 0 whitespace column separators
INFO:  debug _1thresh.png
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-gpageseg", line 462, in safe_process1
    process1(job)
  File "/usr/local/bin/ocropus-gpageseg", line 417, in process1
    segmentation = compute_segmentation(binary,scale)
  File "/usr/local/bin/ocropus-gpageseg", line 348, in compute_segmentation
    colseps,binary = compute_colseps(binary,scale)
  File "/usr/local/bin/ocropus-gpageseg", line 251, in compute_colseps
    colseps = compute_colseps_conv(binary,scale)
  File "/usr/local/bin/ocropus-gpageseg", line 231, in compute_colseps_conv
    DSAVE("1thresh",thresh)
  File "/usr/local/bin/ocropus-gpageseg", line 165, in DSAVE
    imsave(fname,image)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/utils.py", line 101, in newfunc
    return func(*args, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 217, in imsave
    im = toimage(arr, channel_axis=2)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/utils.py", line 101, in newfunc
    return func(*args, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 336, in toimage
    cmin=cmin, cmax=cmax)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/utils.py", line 101, in newfunc
    return func(*args, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 97, in bytescale
    cscale = cmax - cmin
TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Possible Solution

Nothing yet.

Steps to Reproduce (for bugs)

Run this command on the attached image (bottom of issue):

ocropus-gpageseg -n -d --maxcolseps=0 --maxseps=0 /tmp/19320671_0.png

Your Environment

Image: 19320671_0.png

zuphilip commented 6 years ago

Okay, I can replicate that issue with updated numpy, scipy.

The error message indicates that the line 165 of gpageseg.py is a problem: https://github.com/tmbdev/ocropy/blob/d3e5cc60b64d070b60d606a16baeda6b436cc23b/ocropus-gpageseg#L165

Because the matrix contains True, False and not any number. One possible solution is to change that line into:

    imsave(fname,image.astype('float'))

Can you confirm that this works?

I am not sure whether float or maybe int would be better here....

freen commented 6 years ago

@zuphilip Confirmed that we tested the patch in PR #301 and it fixed our issue