Closed kba closed 7 years ago
I interpret this differently. A polynomial can be written with coefficients p_i:
Therefore, baseline 0 0;
would stand for the line y = 0*x+0 = 0 i.e. a horizontal line. And something like baseline 0.019 -22;
would stand for y = 0.019 x - 22 which is a slightly skewed line shifted by 22.
Good explanation, thanks. But when would there be more than two values for baseline
?
We could try to run Tesseract on skewed_image
Don't know about more than two values...
I can confirm my theory. With the perfectly aligned test picture the first values (i.e. scope of the line) is zero or close to it. But when rotating this picture by 2° (convert -rotate 2
) the first values is around 0.035 and we have arctan(0.035) ~= 2°. See here for the hocr file: test_picture_rotated.hocr.txt
Once the text lines have been found, the baselines are fitted more precisely using a quadratic spline. This was another first for an OCR system, and enabled Tesseract to handle pages with curved baselines [5], which are a common artifact in scanning, and not just at book bindings.
http://static.googleusercontent.com/media/research.google.com/de//pubs/archive/33418.pdf
But I don't know if Tesseract is actually (still) working like this...
Does that mean, baseline is like the Bezier curves in image editing software? (I Am Not A Mathematician)
Baselines are slightly curved most of the time unless the book spine is removed before scanning, so it's sensible to represent them curved. I wonder if layout engines are able to do that. I could not find any mechanism in ALTO to represent curved baselines.
Well, IMO the baseline would look like this:
There is even a Wikipedia article for baseline in typography.
In the ideal world this is really a (horizontal) line. But for handwriting or skewed scans of text they look differently. We can try to estimate them by some polynomial (or B-spline, Bezier curve or whatever) or just give the best line-approximating for it.
In the alto case I read that they want to indicate a "list of points" (how should they be connected together in the end?) or maybe they mean a list of values (?). Here in hocr one has to specify the coefficients of the polynom which as a function determines for each x-coordinate the corresponding y-coordinate.
* NOTE: The hOCR spec is unclear on how to specify baseline coefficients for
* rotated textlines. For this reason, on textlines that are not upright, this
* method currently only inserts a 'textangle' property to indicate the rotation
* direction and does not add any baseline information to the hocr string.
The hOCR generation code in tesseract is easy to follow and well-documented btw.
Thanks for the link, that's a really good description. We should incorporate a FAQ section with such information or just extend the baseline section, graphic information like the image by @StefRe helps a lot.
@StefRe provided his sample for the spec, the common case with a straight baseline is resolved for me.
Perhaps we can expand @zuphilip's sample with a curved baseline in a new issue, with hOCR data that fits the sample image above.
The Tesseract API page iterator has a method called 'Baseline()' that returns the baseline of a line or a word as two points (x1, y1 x2, y2).
Tesseract API again. If you want 'list of points' (n points > 2) for a text line, you can build it from the line's words points.
https://github.com/kba/hocr-spec/blob/master/hocr-spec.md#baseline:
If I understand correctly, this will be a tuple
x y
for all rectangular areas (withbbox
)?