Open fattynoparents opened 5 months ago
Can you try and visualize the results using the visualizer.py. That way we can better see what the results are doing, as well as what the GT looks like. Maybe I can give suggestions based on that. Also when you say far from ideal, what sort of issues are you running into
Thank you, I will try to use the visualizer. Here's a simple illustration, below are a couple of examples of perfect region detection: If we have some information in the right margin of the page, the post in the middle should be violet (bibliographic post), if however there's nothing in the margins or the information stands in the left margin, the post in the middle should be orange.
Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts:
EDIT: I have tried to run the visualizer, got the following error:
INPUT.SCALING_TEST is not set, inferring from INPUT.SCALING_TRAIN and PREPROCESS.RESIZE.SCALING to be 0.5
Traceback (most recent call last):
File "/src/laypa/tooling/visualization.py", line 298, in <module> main(args)
File "/src/laypa/tooling/visualization.py", line 211, in main fig_manager.window.showMaximized()
^^^^^^^^^^^^^^^^^
AttributeError: 'FigureManagerBase' object has no attribute 'window'
Thank you very much for the images. I'm not sure why the visualization tool doesn't work it might be because the docker doesn't have a graphic interface. In that case, run with the --save flag (I think. Check the -h flag to be certain). But it will give similar images to what you have sent.
For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data. If the only way to tell them apart is through the text, laypa alone will not be enough. Since it doesn't actually read the text and does it mainly based on layout. For example could you tell these regions apart when squinting? If not then you'll probably need to do something with the text itself. You can try to combine these region classes and then post processing for example. If they are visually distinct we'll have to look deeper at the problem.
For the problem of the regions not being apart there might be something. First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned. But also know that this is just a problem that can occur when pixels are labeled incorrect. If you know that they will all look like this, you should also have a look at instance segmentation. That unfortunately is not completely finished in laypa. But when working with separated blocks of text might work very well.
For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data.
Yes, the main thing here is not the textual data, but that some textual data exists in the margins. The text in the middle should have a violet class if there also exists some data in the right margin, like here: or here: Then when there's nothing in the right margin, it should be orange, like here:
The model is actually sometimes quite good at predicting these cases and assigns the various classes correctly, so I guess it somehow can detect if there is something in the margins or not.
First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned.
Thanks for the suggestion. How can I understand if a lot of whitespace is labeled as part of a class? Would this f.ex. be such a case?
Ok, if that is the context than it seems plausible to me that you can indeed make these predictions without the text information. But what you are doing seems correct, so I can't pinpoint something that will definitely improve the model. I have seen this mixing of regions, but in that case they were only really different in the type of text they contained.
Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts:
Just so I'm clear what would the GT for this type of data look like?
Would this f.ex. be such a case?
That is quite a lot of whitespace. But considering it is mostly horizontal, I'm not sure how much impact it will be. Also the boundaries of the box are fairly well defined due to the black border. You could experiment with assigning less whitespace, but considering how much work this might be we can maybe first try something else.
Another idea I had was to change the scale at which the prediction is done. I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be scaling
. As this will affect the context that the model can take into account
Just so I'm clear what would the GT for this type of data look like?
Here's what the GT for that picture looks like:
Here's what I get with Laypa:
Just as a comparison, here's what I get with kraken (I was curious what it can get me so I tried that project as well)
I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be
scaling
.
Thanks for the tips! To try this, do I need to change some parameters in the config file?
Nice to see that kraken performs so well :+1: Maybe it's more suited to your particular problem.
But I don't think the methods are that dissimilar, so why it performs better is something I'm very interested in. But that's for me to figure out :smile:
To try this, do I need to change some parameters in the config file?
Yes, This is done using the PREPROCESS.RESIZE
and INPUT.RESIZE
parameters and their subparameters
Hi again, I have now experimented with the RESIZE parameter, which unfortunately didn't influence the result. Will be grateful for any other ideas :)
Some further observations: kraken segmentation performs better in terms of detecting regions, but is quite worse than Laypa in terms of drawing correct baselines.
Thank you for your observations.
I'm not sure what to do next to improve results on your side. Maybe it's possible to combine the results from Kraken and Laypa.
I am gonna try to see if implementing the loss method used in Kraken (multiple BCE loss) might improve the region results of Laypa. However, I'm not sure when I'll have time for this, and when it will be finished (vacation also coming up :smile:).
Vacation is important :) Thanks for all your help and have a good rest!
Are there any parameters in the config.yaml file that I can change to improve the training of regions? Right now I have reached the total_loss score of 0.1351 when running 20000 iterations on 307 training images and 96 validation images, and if I just increase the number of max iterations or lower the learning rate, it doesn't influence the result much. Also, the total_loss score seems to be really good, yet when trying to recognize the regions inside the Loghi pipeline, it's far from ideal.