Open romanchetto opened 5 years ago
@noahmetzger, could you please test that with the latest code?
With the new algorithm its definitly better, but still far away from perfect.
Version 4.1
Version 5
Hello @noahmetzger , @stweil . Is version 5 - current "master" and 4.1 - July release? By the way, could you please try on version 4.0 to compare with 4.1 and 5? On version 4.0 I don't face with this issue
If we are talking about the commits 4.1: 5280bbcade4e2dec5eef439a6e189504c2eadcd9 and 4.0: c69859cacb040a518cd64206ab1a2d6e48d17854
for me the bounding boxes are completely identical
@noahmetzger , Sorry, under 4.0 I have meant Release 4.0.0 from 29 October 2018, commit 5131699
@romanchetto your right 4.0 had better bounding boxes compared to 4.1.
compared to 5.0 its hard to say which one is better. But look for yourself. Here is the outcome of 4.0
I tried to bisect this. It looks like the regression was introduced by commit ce88adbf326a40b08de32e35eafffd29ef43290e.
@stweil , @noahmetzger, Thank you for your help. I will think how to handle this from my side, or just wait for 5.0 release.
We got the same issue, and the result of 4.1 is more worse then 4.0. We plan rollback to version 4.0. And hope it can be resolved in 5.0.
@shermanrxie : your comment with testing image is useless, and hope that somebody fit it in 5.0 without testing case has no meaning.
Hello everyone.
After upgrading from tesseract v.4.0 to 4.1 I have faced with the next issue: sometimes symbols in words are swapped. I've found out that returned text value and bounding rectangle from word result iterator are OK. But when I collect problem word symbols from symbol iterator, I've found out that X-coordinate and width are sometimes incorrect:
On this screenshot you can see that symbol "m" goes before "A" in the word "American", and its width is twice longer that average symbol length in the word
Tesseract 4.1 release notes says: "Fix for bounding box problem." Maybe this fix somehow relates to this issue.
Environment
Current Behavior:
Incorrect symbol bounding rectagle value, if order by X-coordinate symbols are swapped
Expected Behavior:
All symbol bounding rectangle values are correct. If order by X-coordinate word symbols are in corresponding order, like in word text value