ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.41k stars 590 forks source link

--probabilities option of ocropus-rpred causes IndexError #324

Open nickjwhite opened 5 years ago

nickjwhite commented 5 years ago

I recently installed ocropus on a server on Amazon's EC2, based on their latest "Amazon Linux 2 AMI", which I believe is based on Fedora (it certainly uses yum). On this, running ocropus-rpred with the --probabilities argument causes a traceback error:

Traceback (most recent call last):
  File "/usr/bin/ocropus-rpred", line 276, in safe_process1
    return process1(arg)
  File "/usr/bin/ocropus-rpred", line 203, in process1
    result = lstm.translate_back(network.outputs,pos=2)
  File "/usr/lib/python2.7/site-packages/ocrolib/lstm.py", line 776, in translate_back
    if pos==2: return [(c, outputs[r,c]) for (r,c) in maxima] # include character probabilities
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I am guessing that it's numpy that is the cause of this issue. Version 1.7.1 is installed.

I found the reason for it is that the maxima array is full of floats (like 12.0), which aren't valid as array indices. It can be fixed by changing the offending line to: if pos==2: return [(c, outputs[int(r),int(c)]) for (r,c) in maxima] # include character probabilities

I've done that in my 'fixprobs' branch, commit ebd462b38aa42ee5527c6176c443b6d3610b0bf3 , and it seems to work fine. There could be more places where maxima using floats is an issue, but I haven't come across any.

kba commented 5 years ago

Thanks, that seems a clear bug and ebd462b38aa42ee5527c6176c443b6d3610b0bf3 a simple fix. If you open a PR with https://github.com/tmbdev/ocropy/commit/ebd462b38aa42ee5527c6176c443b6d3610b0bf3 I'll merge it.