stefanklut / laypa

Layout analysis to find layout elements in documents (similar to P2PaLA)
MIT License
17 stars 4 forks source link

Parameters for inference #29

Open indicator0 opened 6 months ago

indicator0 commented 6 months ago

Hi! I am using Layla for baseline detection as a part of Loghi. I've noticed that there are times when, despite being a whole line in the input image, the Laypa model recognizes this line as two separate sentences during inference because the distance between them is just slightly longer. This in turn leads to errors in the subsequent transcribing.

Is it possible to alter the config to enhance the inference results while using the original model weights? Thank you!

stefanklut commented 6 months ago

Thank you for your interest,

Unfortunately I don't think that there is a lot you can do without finetuning to improve results. The one thing you could look at is the internal size used (INPUT.MIN_SIZE_TEST and INPUT.MAX_SIZE_TEST). However, this might negatively impact performance in another way, since it has not been trained on this size.

Do you have some more info on the type of image where this problem occurs? We have seen this type of behavior on really small images for example. Also worth noting that the opposite of your problem is also something that we are trying to prevent. That being text lines that are close together, but should be separated (e.g. newspapers).

indicator0 commented 6 months ago

Thanks for your quick reply!

Here I can provide two examples. The first pair of images in line "med de frisinnade" has an unwanted baseline break, the second pair of images after "§6" has several unwanted baseline break due to the large space between words. These should logically in one line, but the model breaks them.

Screenshot 2024-03-04 at 1 40 15 PM Screenshot 2024-03-04 at 1 56 42 PM Screenshot 2024-03-04 at 1 46 25 PM Screenshot 2024-03-04 at 1 56 35 PM