stefanklut / laypa

Layout analysis to find layout elements in documents (similar to P2PaLA)
MIT License
17 stars 4 forks source link

Parameters to improve region training #41

Open fattynoparents opened 2 months ago

fattynoparents commented 2 months ago

Are there any parameters in the config.yaml file that I can change to improve the training of regions? Right now I have reached the total_loss score of 0.1351 when running 20000 iterations on 307 training images and 96 validation images, and if I just increase the number of max iterations or lower the learning rate, it doesn't influence the result much. Also, the total_loss score seems to be really good, yet when trying to recognize the regions inside the Loghi pipeline, it's far from ideal.

stefanklut commented 2 months ago

Can you try and visualize the results using the visualizer.py. That way we can better see what the results are doing, as well as what the GT looks like. Maybe I can give suggestions based on that. Also when you say far from ideal, what sort of issues are you running into

fattynoparents commented 2 months ago

Thank you, I will try to use the visualizer. Here's a simple illustration, below are a couple of examples of perfect region detection: image image If we have some information in the right margin of the page, the post in the middle should be violet (bibliographic post), if however there's nothing in the margins or the information stands in the left margin, the post in the middle should be orange.

Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts: image

EDIT: I have tried to run the visualizer, got the following error:

INPUT.SCALING_TEST is not set, inferring from INPUT.SCALING_TRAIN and PREPROCESS.RESIZE.SCALING to be 0.5
Traceback (most recent call last):
File "/src/laypa/tooling/visualization.py", line 298, in <module> main(args)
File "/src/laypa/tooling/visualization.py", line 211, in main fig_manager.window.showMaximized()
^^^^^^^^^^^^^^^^^
AttributeError: 'FigureManagerBase' object has no attribute 'window'       
stefanklut commented 1 month ago

Thank you very much for the images. I'm not sure why the visualization tool doesn't work it might be because the docker doesn't have a graphic interface. In that case, run with the --save flag (I think. Check the -h flag to be certain). But it will give similar images to what you have sent.

For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data. If the only way to tell them apart is through the text, laypa alone will not be enough. Since it doesn't actually read the text and does it mainly based on layout. For example could you tell these regions apart when squinting? If not then you'll probably need to do something with the text itself. You can try to combine these region classes and then post processing for example. If they are visually distinct we'll have to look deeper at the problem.

For the problem of the regions not being apart there might be something. First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned. But also know that this is just a problem that can occur when pixels are labeled incorrect. If you know that they will all look like this, you should also have a look at instance segmentation. That unfortunately is not completely finished in laypa. But when working with separated blocks of text might work very well.

fattynoparents commented 1 month ago

For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data.

Yes, the main thing here is not the textual data, but that some textual data exists in the margins. The text in the middle should have a violet class if there also exists some data in the right margin, like here: image or here: image Then when there's nothing in the right margin, it should be orange, like here: image

The model is actually sometimes quite good at predicting these cases and assigns the various classes correctly, so I guess it somehow can detect if there is something in the margins or not.

First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned.

Thanks for the suggestion. How can I understand if a lot of whitespace is labeled as part of a class? Would this f.ex. be such a case? image

stefanklut commented 1 month ago

Ok, if that is the context than it seems plausible to me that you can indeed make these predictions without the text information. But what you are doing seems correct, so I can't pinpoint something that will definitely improve the model. I have seen this mixing of regions, but in that case they were only really different in the type of text they contained.

Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts:

Just so I'm clear what would the GT for this type of data look like?

Would this f.ex. be such a case?

That is quite a lot of whitespace. But considering it is mostly horizontal, I'm not sure how much impact it will be. Also the boundaries of the box are fairly well defined due to the black border. You could experiment with assigning less whitespace, but considering how much work this might be we can maybe first try something else.

Another idea I had was to change the scale at which the prediction is done. I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be scaling. As this will affect the context that the model can take into account

fattynoparents commented 1 month ago

Just so I'm clear what would the GT for this type of data look like?

Here's what the GT for that picture looks like: image

Here's what I get with Laypa: image

Just as a comparison, here's what I get with kraken (I was curious what it can get me so I tried that project as well) image

I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be scaling.

Thanks for the tips! To try this, do I need to change some parameters in the config file?

stefanklut commented 1 month ago

Nice to see that kraken performs so well :+1: Maybe it's more suited to your particular problem.

But I don't think the methods are that dissimilar, so why it performs better is something I'm very interested in. But that's for me to figure out :smile:

To try this, do I need to change some parameters in the config file?

Yes, This is done using the PREPROCESS.RESIZE and INPUT.RESIZE parameters and their subparameters

fattynoparents commented 1 month ago

Hi again, I have now experimented with the RESIZE parameter, which unfortunately didn't influence the result. Will be grateful for any other ideas :)

Some further observations: kraken segmentation performs better in terms of detecting regions, but is quite worse than Laypa in terms of drawing correct baselines.

stefanklut commented 1 month ago

Thank you for your observations.

I'm not sure what to do next to improve results on your side. Maybe it's possible to combine the results from Kraken and Laypa.

I am gonna try to see if implementing the loss method used in Kraken (multiple BCE loss) might improve the region results of Laypa. However, I'm not sure when I'll have time for this, and when it will be finished (vacation also coming up :smile:).

fattynoparents commented 1 month ago

Vacation is important :) Thanks for all your help and have a good rest!