ibm-aur-nlp / PubLayNet

Other
900 stars 165 forks source link

Wrong lables #32

Open Sagar1094 opened 3 years ago

Sagar1094 commented 3 years ago

Hi,

I have been trying to find labels just for tabular data. First of all there are too many None values in image column. I have tried filtering based on category_id=4 i.e., Table. After filtering, I got the Bbox and created a seperate column and tried seperating only the filtered images. When I looked into the filtered images there were too many images in which there was no table or table like structure present. Then, I tried creating the bbox over it and was getting the incorrect bounding box which is covering the partial table or in many cases no table at all. I am reading the bbox as (xmin ymin w h). Have tried other variations as well. Am I missing something here. Please help

ajjimeno commented 3 years ago

Hi Sagar1094, can you provide examples of the issue? Thank you in advance.

On Thu, Dec 17, 2020 at 12:34 PM Sagar1094 notifications@github.com wrote:

Hi,

I have been trying to find labels just for tabular data. First of all there are too many None values in image column. I have tried filtering based on category_id=4 i.e., Table. After filtering, I got the Bbox and created a seperate column and tried seperating only the filtered images. When I looked into the filtered images there were too many images in which there was no table or table like structure present. Then, I tried creating the bbox over it and was getting the incorrect bounding box which is covering the partial table or in many cases no table at all. I am reading the bbox as (xmin ymin w h). Have tried other variations as well. Am I missing something here. Please help

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ibm-aur-nlp/PubLayNet/issues/32, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6BZDN5LFBVRWDNTRLCBA3SVFN4BANCNFSM4U625JXA .

Sagar1094 commented 3 years ago

Hi ajjimeno, PFA two examples would provide more if required :-

Image name:- PMC522826_00007.jpg BBOX :- [35.89, 130.41, 514.03, 602.48]

table_example1

Image Name:- PMC2248230_00005.jpg BBOX:- [36.0, 73.05, 522.01, 66.05]

table_example2

Image Name:- PMC1410770_00002.jpg BBOX :- [123.5, 59.5, 390.0, 619.0]

table_example3

This is a partial table covering example. I am yet to come across an example which is perfectly annotated. Enough to create the doubt on my approach. Let me know what am I missing here.

Thanks a lot. :)

ajjimeno commented 3 years ago

Hi, we provide the following jupyter notebook as an example for displaying bounding boxes for the different objects in the images and annotations. I am wondering if you had the chance to have a look at it and it helps.

https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/43cb95d9-6c3e-479c-a189-8c9ff3524ec1/view?access_token=bb8ce645cf114b5f5512ae2eb9c7badcf0927f313e8f76b8138d0701289484e6&cm_sp=ibmdev-_-developer-exchanges-_-cloudreg

On Thu, Dec 17, 2020 at 2:07 PM Sagar1094 notifications@github.com wrote:

Hi ajjimeno, PFA two examples would provide more if required :-

Image name:- PMC522826_00007.jpg BBOX :- [35.89, 130.41, 514.03, 602.48]

[image: table_example1] https://user-images.githubusercontent.com/54572031/102438068-a29a3880-4041-11eb-8735-afd2f90b9c97.png

Image Name:- PMC2248230_00005.jpg BBOX:- [36.0, 73.05, 522.01, 66.05]

[image: table_example2] https://user-images.githubusercontent.com/54572031/102438710-d45fcf00-4042-11eb-9e96-895d45dc93c4.png

This is a partial table covering example. I am yet to come across an example which is perfectly annotated. Enough to create the doubt on my approach. Let me know what am I missing here.

Thanks a lot. :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ibm-aur-nlp/PubLayNet/issues/32#issuecomment-747174762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6BZDJQWQD6NUW6T7FCFA3SVFYYPANCNFSM4U625JXA .

Sagar1094 commented 3 years ago

Hi, I had the chance to go through the given notebook and used to same code on the said examples. But the results looked identical 🙄

ajjimeno commented 3 years ago

I have checked those examples and I cannot see anything different to what you find, the data set has some noise, and my belief is that you have identified some of the noise in the data set in the examples you are considering. Please, let me know if you haven't found examples that you are after.

On Thu, Dec 17, 2020 at 3:03 PM Sagar1094 notifications@github.com wrote:

Hi, I had the chance to go through the given notebook and used to same code on the said examples. But the results looked identical 🙄

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ibm-aur-nlp/PubLayNet/issues/32#issuecomment-747190555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6BZDNUCGDQH76NZOQNHBTSVF7I7ANCNFSM4U625JXA .

yongzhuo commented 2 years ago

maybe a lot. I have tried filtering 5 images based on category_id=3, not correct at all.

like PMC4533237_00004.jpg PMC5165033_00004.jpg PMC6055613_00008.jpg PMC3437617_00002.jpg