Closed bertsky closed 2 years ago
Quality is amazing, performance is outstanding!
Here's a rerun of the example from neural DFKI dewarper:
original | anybaseocr-dewarp | fix-perspective |
---|---|---|
The text block is much better aligned when pre-cropped. The ruler is perfectly aligned when present — and text block is slightly tilt to right (top more than bottom).
The text block is much better aligned when pre-cropped. The ruler is perfectly aligned when present — and text block is slightly tilt to right (top more than bottom).
Oh right – anybaseocr was allowed to see the cropped image, so for a fair comparison fix-perspective should also:
cropped | anybaseocr-dewarp | fix-perspective |
---|---|---|
I've noticed a unicolor triangle on the left side in the fix-perspective color image, hex color #d2c5b4 — do you know where this is from?
The minimal structure on the right side (of the pages below) leads to a wrong estimate of the optimal skewing angle for the left side. Here: colsums, upper half: extra cropping by me, lower half: without extra cropping
Comment for me: max(mean) instead of max(standard deviation) probably sufficient after absdiff(col|rowsums -col|rowblur)
Could you tell me how many dpi the png here has? Do you have more examples? See https://github.com/jbarth-ubhd/fix-perspective/issues/3#issuecomment-1106707098
I've noticed a unicolor triangle on the left side in the fix-perspective color image, hex color #d2c5b4 — do you know where this is from?
Yes, that's the median colour estimated by OCR-D as a filler outside the page mask (Border polygon) – for image consumers that cannot handle or ignore the alpha channel. It became overt because fix-perspective
's GraphicsMagick library throws away the alpha channel of the input. (If you know a simple fix for this, it would be great if you could integrate it.)
The minimal structure on the right side (of the pages below) leads to a wrong estimate of the optimal skewing angle for the left side. Here: colsums, upper half: extra cropping by me, lower half: without extra cropping
Yes, that's unfortunate. I can see how it would be difficult to remove that kind of artifact in Hough space. (But it's also difficult to decide the optimal set of boundary lines in the cropper.)
Comment for me: max(mean) instead of max(standard deviation) probably sufficient after absdiff(col|rowsums -col|rowblur)
Is that related? (So, are you already considering a workaround for suboptimally cropped images with the shadow of the spine or the adjacent pages giving additional vertical lines?)
It became overt because fix-perspective's GraphicsMagick library throws away the alpha channel of the input. (If you know a simple fix for this, it would be great if you could integrate it.)
I do not use GraphicsMagick in fix-perspective, do you mean blitzDrt?
I do not use GraphicsMagick in fix-perspective, do you mean blitzDrt?
Sorry, I did indeed confuse them again. But the same holds for imread(..., IMREAD_REDUCED_GRAYSCALE_2)
– it throws away the alpha channel and thus exposes the filler colour from core's image API.
(did use ImageMagick++ in blitzDrt)
fix-perspective uses …REDUCED_GRAYSCALE_2 for internal analysis only & does its own alpha handling for background bluring:
/* So I'm doing blur(grayscale ⊙ alpha) ⊘ blur(alpha) */
I would like to add IMREAD_UNCHANGED
to rgbIm
, but IMREAD_UNCHANGED
does not handle exif orientation.
Oh, I see. Frankly, I don't know what is the correct behaviour here. I guess you can either ignore alpha channel or use blending (perhaps even masking) to calculate the transformation – but the final transformation should run on the full image (including alpha).
Ignoring alpha is not a good idea, because R+G+B channels still contains pixel values that can be very different from the page edge nearby. This would lead to bad background (→median, blur) subtraction and so the alpha channel values outside the page are the most distinctive structure the scan is aligned to.
Grrr opencv really has no option to get alpha channel and do exif respect orientation. IMREAD_UNCHANGED does ignore exif. Would have to modify opencv loadsave.cpp.
Note: renaming the source files with a different suffix is necessary to get GNU make's implicit rules for C++ fire.
I'd also like to point out that installing OpenCV 4 from source is not trivial (because it does not install a pkg-config rule by default, and is not included in systems like Ubuntu 18 yet), but let's assume users find their way around it (and at least for Ubuntu 20 onwards, this
deps-ubuntu
rule is enough).