jbarth-ubhd / fix-perspective

MIT License
3 stars 1 forks source link

minimal makefile, additional targets, update readme #1

Closed bertsky closed 2 years ago

bertsky commented 2 years ago

Note: renaming the source files with a different suffix is necessary to get GNU make's implicit rules for C++ fire.

I'd also like to point out that installing OpenCV 4 from source is not trivial (because it does not install a pkg-config rule by default, and is not included in systems like Ubuntu 18 yet), but let's assume users find their way around it (and at least for Ubuntu 20 onwards, this deps-ubuntu rule is enough).

bertsky commented 2 years ago

Quality is amazing, performance is outstanding!

Here's a rerun of the example from neural DFKI dewarper:

original anybaseocr-dewarp fix-perspective
BIN_1586 IMG-BINARIZED DEWARP-TEST_1586 IMG-DEW BIN_1586 IMG-BINARIZED fix-perspective
MAX_1586 anybaseocr-raw MAX_1586 fix-perspective
jbarth-ubhd commented 2 years ago

The text block is much better aligned when pre-cropped. The ruler is perfectly aligned when present — and text block is slightly tilt to right (top more than bottom).

bertsky commented 2 years ago

The text block is much better aligned when pre-cropped. The ruler is perfectly aligned when present — and text block is slightly tilt to right (top more than bottom).

Oh right – anybaseocr was allowed to see the cropped image, so for a fair comparison fix-perspective should also:

cropped anybaseocr-dewarp fix-perspective
BINCROP_1586 IMG-CROP BINCROPDEW_1586 IMG-DEW BINCROP_1586 fix-perspective
CROP-TEST_1586 IMG-CROP CROP-TEST_1586 fix-perspective
jbarth-ubhd commented 2 years ago

I've noticed a unicolor triangle on the left side in the fix-perspective color image, hex color #d2c5b4 — do you know where this is from?

jbarth-ubhd commented 2 years ago

The minimal structure on the right side (of the pages below) leads to a wrong estimate of the optimal skewing angle for the left side. Here: colsums, upper half: extra cropping by me, lower half: without extra cropping image

jbarth-ubhd commented 2 years ago

Comment for me: max(mean) instead of max(standard deviation) probably sufficient after absdiff(col|rowsums -col|rowblur)

jbarth-ubhd commented 2 years ago

Could you tell me how many dpi the png here has? Do you have more examples? See https://github.com/jbarth-ubhd/fix-perspective/issues/3#issuecomment-1106707098

bertsky commented 2 years ago

I've noticed a unicolor triangle on the left side in the fix-perspective color image, hex color #d2c5b4 — do you know where this is from?

Yes, that's the median colour estimated by OCR-D as a filler outside the page mask (Border polygon) – for image consumers that cannot handle or ignore the alpha channel. It became overt because fix-perspective's GraphicsMagick library throws away the alpha channel of the input. (If you know a simple fix for this, it would be great if you could integrate it.)

bertsky commented 2 years ago

The minimal structure on the right side (of the pages below) leads to a wrong estimate of the optimal skewing angle for the left side. Here: colsums, upper half: extra cropping by me, lower half: without extra cropping

Yes, that's unfortunate. I can see how it would be difficult to remove that kind of artifact in Hough space. (But it's also difficult to decide the optimal set of boundary lines in the cropper.)

Comment for me: max(mean) instead of max(standard deviation) probably sufficient after absdiff(col|rowsums -col|rowblur)

Is that related? (So, are you already considering a workaround for suboptimally cropped images with the shadow of the spine or the adjacent pages giving additional vertical lines?)

bertsky commented 2 years ago

Could you tell me how many dpi the png here has? Do you have more examples?

Sorry, I had to convert to PNG and reduce size for GH to accept the image here. You can find the original here – it has 300 DPI.

More images (with difficult cropping) here

jbarth-ubhd commented 2 years ago

It became overt because fix-perspective's GraphicsMagick library throws away the alpha channel of the input. (If you know a simple fix for this, it would be great if you could integrate it.)

I do not use GraphicsMagick in fix-perspective, do you mean blitzDrt?

bertsky commented 2 years ago

I do not use GraphicsMagick in fix-perspective, do you mean blitzDrt?

Sorry, I did indeed confuse them again. But the same holds for imread(..., IMREAD_REDUCED_GRAYSCALE_2) – it throws away the alpha channel and thus exposes the filler colour from core's image API.

jbarth-ubhd commented 2 years ago

(did use ImageMagick++ in blitzDrt)

fix-perspective uses …REDUCED_GRAYSCALE_2 for internal analysis only & does its own alpha handling for background bluring:

/* So I'm doing blur(grayscale ⊙ alpha) ⊘ blur(alpha) */

I would like to add IMREAD_UNCHANGED to rgbIm, but IMREAD_UNCHANGED does not handle exif orientation.

bertsky commented 2 years ago

Oh, I see. Frankly, I don't know what is the correct behaviour here. I guess you can either ignore alpha channel or use blending (perhaps even masking) to calculate the transformation – but the final transformation should run on the full image (including alpha).

jbarth-ubhd commented 2 years ago

Ignoring alpha is not a good idea, because R+G+B channels still contains pixel values that can be very different from the page edge nearby. This would lead to bad background (→median, blur) subtraction and so the alpha channel values outside the page are the most distinctive structure the scan is aligned to.

jbarth-ubhd commented 2 years ago

Grrr opencv really has no option to get alpha channel and do exif respect orientation. IMREAD_UNCHANGED does ignore exif. Would have to modify opencv loadsave.cpp.