wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
553 stars 191 forks source link

Is it possible to train PICK-pytorch to detect table and all its line-items? #111

Open locomotiivo opened 2 years ago

locomotiivo commented 2 years ago

Hi wenwenyu, thank you for an amazing code. I have been experimenting around with the code, and found out the training dataset can be adjusted for extractions of different informations.

However, there's one thing I am stuck on, and that is training the model to detect tables and its contents as well. I want the customized model to be able to not only detect the header data, but also list all the table line-item. From what I understand of the code, it seems possible to train it to detect table contents as well, but I don't know how I should set the training data's entities/labels, especially for tables with more than one line-items.

Any help or tip would be greatly apppreciated, thanks :)

ziodos commented 2 years ago

it would be better to check a model for detecting table shape, and then you can parse the content and arrange it, I think the PICK model would be more efficient when you want to extract unstructured data and not tabular ones.