wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
559 stars 193 forks source link

bounding box for test set #34

Open dipesh-commits opened 4 years ago

dipesh-commits commented 4 years ago

Is it possible to get the bounding box coordinates for each predicted labels in test set?

kbrajwani commented 4 years ago

As per my work i have seen bounding box depend on your tsv file. if i am wrong please correct me. The model is just taking line data from tsv file.

x1_1,y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1,transcript 1,83,41,331,41,331,78,83,78,TAN WOON YANN

and trying to assign entity label to it.

so let's say you get json in output folder. { "company" : "TAN WOON YANN" }

So, if you want you can compare that Company string with tsv transcript to get the bounding box coordinates.

n0ct4li commented 3 years ago

But if a text appears multiple times in the tsv file, how to do it?

kbrajwani commented 3 years ago

then you have to change code in test.py as per you need. see line no 50 for step_idx, input_data_item in tqdm(enumerate(test_data_loader)): i think input_data_item contains box details so you can get boxes from it.

n0ct4li commented 3 years ago

Yes I saw that, but in one of my predictions I got a text that is totally not in the tsv file (I follow your previous method). So I don't know if a prediction always correspond to a box..

kbrajwani commented 3 years ago

i don't think its possible to get text that not in tsv file. If you want to check where the text is coming then you can see the flow of your text. like in test.py this will load image and tsv file test_dataset = PICKDataset(boxes_and_transcripts_folder=args.bt, images_folder=args.impt, resized_image_size=(480, 960), ignore_error=False, training=False) in pick_dataset.py line 131 document = documents.Document(boxes_and_transcripts_file, image_file, self.resized_image_size, self.iob_tagging_type, entities_file, training=self.training) in documents.py line 65 boxes_and_transcripts_data = read_ocr_file_without_box_entity_type( boxes_and_transcripts_file.as_posix()) you can print boxes_and_transcripts_data to see text and bounding box.

so all your text must be coming from tsv file.

n0ct4li commented 3 years ago

I already check it. It is weird like for a field I have prediction 'xxxx' but it is not in the tsv file. But in tsv file I have 'xxxxyy'. Are you sure it is not possible to have a text that is not in the tsv file?

n0ct4li commented 3 years ago

@kbrajwani as the iob tagging is per caracter don’t you think it is possible to have in the prediction a text that is not in the tsv file?

kbrajwani commented 3 years ago

yes i have seen the iob tagging is per character. you can say like its possible to have prediction text that is not in tsv file. i have thought like its taking full transcript and assigning a label. i miss the character level tagging. i think better that @wenwenyu @tengerye will answer this.

jorgerodriguezsj commented 3 years ago

Have you found any solution for this?

ndcuong91 commented 3 years ago

Yes I saw that, but in one of my predictions I got a text that is totally not in the tsv file (I follow your previous method). So I don't know if a prediction always correspond to a box..

I have the same issue. Did you have any update @wenwenyu @tengerye ?

htdung167 commented 1 year ago

@jorgerodriguezsj @ndcuong91

Have you found any solution for this? I realized that it splits a word into parts.

Ex: QLO -> Q and LO image image