wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
553 stars 191 forks source link

Not clear how to annotate my documents #72

Open wilfreddesert opened 3 years ago

wilfreddesert commented 3 years ago

Hi @wenwenyu

I cannot wait to try your model with my data. It's actually quite a huge dataset with documents of various layouts for which I would like to extract a set of key/value pairs.

I have a few questions though regarding the format of data for training:

Is this the only format possible? I use Google Vision API to create text annotations and this results in word-level entities so my initial idea was to label my data on a word-level. Will this not work for PICK?

Another question relates to one of the sample files: https://github.com/wenwenyu/PICK-pytorch/blob/master/data/data_examples_root/boxes_and_transcripts/X00016469623.tsv

As far as I understand from the description, the first column is id, but why do all the values in the first column equal 1 in that file?

Thanks!

jianglong-he-Infrrd commented 3 years ago

I have a similar problem, please let me know if you have found the solution to have PICK work with word-level annotations.

nehasaraf1994 commented 3 years ago

Hi @wilfreddesert were you able to get answers to your question? Would really love to know about how did you deal with word entities.