knaw-huc / loghi

MIT License
101 stars 16 forks source link

Placement of baselines #30

Closed icarl-ad closed 3 weeks ago

icarl-ad commented 1 month ago

Hi,

I was wondering how the baselines need to be set for training. Is it ok, if the baseline sits quite high and almost slice through the main body of the text? Here is an example: image

Thanks in advance!

stefanklut commented 1 month ago

Hi there,

Thank you for your interest in Loghi.

Now for your question. The most import thing is to be consistent when placing the baselines. The idea behind a baseline is to be the imaginary line on which the text is written. With descenders like e.g. a letter "g" having its downward line go below the line. That being said as long as the placement is consistent it can differ slightly from this.

There are 2 parts in which the placement of the baseline is important. To train the layout model (Laypa) and to cut out line to train the HTR.

For the layout model, the placement might clash slightly with the pretrained weights of the general model. Which is based on the "ideal" baselines. But it will not differ by that much, as the line is painted in with some margin for error. And even then it should be able to learn to paint in baselines slightly higher.

For HTR the baselines are used to cut out the text lines that are used for training. This is done using seam carving, to cut around the letters. This takes the baseline as a starting point. But also cuts around letters (including descenders) by taking the pixels of the letters into account as well. So having a baseline that is slightly off might have an effect, but @rvankoert seems to thing that this is a small enough change to not have too much effect.

Let me know if you have any more questions