Not clear how to annotate my documents

wilfreddesert commented 3 years ago

Hi @wenwenyu

I cannot wait to try your model with my data. It's actually quite a huge dataset with documents of various layouts for which I would like to extract a set of key/value pairs.

I have a few questions though regarding the format of data for training:

In your examples, annotations are for entities as a whole. If some_field's value consists of 4 words then you specify all the 4 words as the label.

Is this the only format possible? I use Google Vision API to create text annotations and this results in word-level entities so my initial idea was to label my data on a word-level. Will this not work for PICK?

Another question relates to one of the sample files: https://github.com/wenwenyu/PICK-pytorch/blob/master/data/data_examples_root/boxes_and_transcripts/X00016469623.tsv

As far as I understand from the description, the first column is id, but why do all the values in the first column equal 1 in that file?

Thanks!

jianglong-he-Infrrd commented 3 years ago

I have a similar problem, please let me know if you have found the solution to have PICK work with word-level annotations.

nehasaraf1994 commented 3 years ago

Hi @wilfreddesert were you able to get answers to your question? Would really love to know about how did you deal with word entities.

wenwenyu / PICK-pytorch

Not clear how to annotate my documents #72