qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
340 stars 29 forks source link

Issue in machine_based_reading_order_integration branch #125

Open LucPol98 opened 5 months ago

LucPol98 commented 5 months ago

Hi, I was looking at your branch regarding the computation of region ordering by neural network and I think I noticed a possible issue that I want to report you. Specifically, I tried the code on an image that has the following characteristics: in the center there is a large image going left to right and there are two columns broken from this image. Therefore, there are 4 paragraphs and one image.

The new sorting you want to implement does not read the paragraphs in a columnar way, but first reads the ones at the top of the image and then the ones at the bottom of the image from left to right. On the other hand, the previous sorting read them correctly. This seems to me to be an issue whenever an image is present. If you want, I can share with you the image in issue, even though the language of the text is Italian and you might still not understand why it continues in this way by looking at the text.

I assume that the new network does not take into account the presence of images and therefore seeing paragraphs far apart does not understand that they should be read vertically as there is an aesthetic break and not first looking at the upper cluster and then at the lower cluster because of a contextual break.

I don't know if I misunderstood or if it is an intended behavior, but just in case it is not, it introduces this problem and I wanted to flag it for correction.

Apart from that, the new ordering is fantastic and when there are no pictures it is clearly superior to the previous one. Well done! :100:

vahidrezanezhad commented 5 months ago

Hi, I was looking at your branch regarding the computation of region ordering by neural network and I think I noticed a possible issue that I want to report you. Specifically, I tried the code on an image that has the following characteristics: in the center there is a large image going left to right and there are two columns broken from this image. Therefore, there are 4 paragraphs and one image.

The new sorting you want to implement does not read the paragraphs in a columnar way, but first reads the ones at the top of the image and then the ones at the bottom of the image from left to right. On the other hand, the previous sorting read them correctly. This seems to me to be an issue whenever an image is present. If you want, I can share with you the image in issue, even though the language of the text is Italian and you might still not understand why it continues in this way by looking at the text.

I assume that the new network does not take into account the presence of images and therefore seeing paragraphs far apart does not understand that they should be read vertically as there is an aesthetic break and not first looking at the upper cluster and then at the lower cluster because of a contextual break.

I don't know if I misunderstood or if it is an intended behavior, but just in case it is not, it introduces this problem and I wanted to flag it for correction.

Apart from that, the new ordering is fantastic and when there are no pictures it is clearly superior to the previous one. Well done! 💯

Thank you for taking the time to test the new model. As you're aware, machine-based reading order detection relies on ground truth (GT) for training, unlike heuristic methods. As you correctly pointed out, the machine-based approach doesn't perform well for certain document layouts because these layouts aren't represented in the training ground truth data. Other issues may also arise. For instance, newspapers with multiple articles where the articles can be read in any random order (although the text regions within articles have a unique reading order) are not covered in our ground truth dataset. This tool represents our initial attempt at a machine-based model for reading order, and we aim to enhance it in terms of both ground truth data and model structure.