stefanklut / laypa

Layout analysis to find layout elements in documents (similar to P2PaLA)
MIT License
17 stars 4 forks source link

Warning "No region type defined for eSc_dummyblock_c" when training a region model #37

Closed fattynoparents closed 4 months ago

fattynoparents commented 4 months ago

When I try training a region model, I get quite a few of such warnings:

WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: No region type defined for eSc_dummyblock_ at /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml
WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: Element type "None" undefined in class dict /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml
WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: No region type defined for eSc_dummyblock_ at /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml
WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: Element type "None" undefined in class dict /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml
WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: No region type defined for eSc_dummyblock_ at /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml
WARNING [05/06 11:19:39 laypa.page_xml.xmlPAGE]: Element type "None" undefined in class dict /home/user/training-laypa/region/2024.05.06/val_input/page/4212.xml

Why would this happen and should I pay attention? If yes, how can I fix this? Thanks in advance!

stefanklut commented 4 months ago

This mean that in the GT there is a region that doesn't have a label. If the region has no label it will be ignored, so the pixels will just be classified (in the GT) as background

fattynoparents commented 4 months ago

This mean that in the GT there is a region that doesn't have a label.

Hmm that's weird, since all regions in my GT do have labels. Here's all that basically exists on the page in the example that gives the warning above: image

stefanklut commented 4 months ago

Is there something in the pageXML when you look for eSc_dummyblock_? That's what the code is saying is missing a region type. Sounds like it's something from eScriptorium?

fattynoparents commented 4 months ago

You are right, seems eScriptorium didn't remove the dummy green region that occupied the whole page (a default situation after running Loghi with only baseline detection), although I did remove it manually in their UI. Question is, why do I get 3 similar warnings about it even though there's only 1 eScdummyblock element?

As far as I remember before v.2.0.0 loghi was just drawing dummy regions around baselines it considered be together. And now it only gives one big dummy region occupying the whole page. Is it smth that can be changed to how it was before?

stefanklut commented 4 months ago

The three warning are probably due to the preprocessing reading this region 3 times (could probably be optimized :smile: ), once for semantic segmentation once for instance segmentation and once for panoptic segmentation.

This is due to the reading order, but I am not sure why this would change. Maybe @rvankoert or @TimKoornstra knows. As there were some changes to the inference script. But this is done somewhere in the Java or bash part

edit:

after running Loghi with only baseline detection

Does that mean not running the RECALCULATEREADINGORDER order as well? Because that is the part that does the grouping. So maybe you didn't turn that off before 2.0.0

edit 2: RECALCULATEREADINGORDER should be turned off if you want to use the predicted regions. But, if you want to keep them to use in making GT when just running baseline detection, it should be turned on

fattynoparents commented 4 months ago

Does that mean not running the RECALCULATEREADINGORDER order as well?

This might very well be the issue, thanks for the tip, I will check this!