ibm-aur-nlp / PubLayNet

Other
915 stars 164 forks source link

bounding box for each list item #6

Open kailigo opened 5 years ago

kailigo commented 5 years ago

could you advise how to get bound box for each list item -- currently, a bounding box cover all list items; I would like to have a separate bound box for each item. Thanks.

zhxgj commented 5 years ago

Hi @kailigo bounding boxes for list items cannot be directly obtained from the raw PubLayNet. We consider the whole list as a basic layout element. To get the boxes for list items, you will need to extract the text within the list box from the pdf, and try to match it with the list items in the xml.