Dataset preparation for extracting LineItems(tabular form) in invoice

wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

https://arxiv.org/abs/2004.07464

MIT License

559 stars 193 forks source link

Dataset preparation for extracting LineItems(tabular form) in invoice #59

Open Karthik1904 opened 3 years ago

Karthik1904 commented 3 years ago

Hi, kindly please suggest dataset preparation for extraction each item in table in invoice

ziodos commented 3 years ago

you need to process your image using an ocr and then extract each field bounding boxes

Karthik1904 commented 3 years ago

Means, in similar way how we prepared data for date & invoice no correct?

ziodos commented 3 years ago

I can't understand what you mean

Karthik1904 commented 3 years ago

Can i label the each line items in the table to a tag?

Screenshot 2020-11-09 at 10 48 45 AM

AtulKumar4 commented 3 years ago

you can play with inference script. I am able to get approx 90% accurate results for line items.

Karthik1904 commented 3 years ago

you can play with inference script. I am able to get approx 90% accurate results for line items.

Hi @AtulKumar4, can you give detail please it will be helpful, i was struggling from long time

how should we label the image for lineitems & any changes in the code we have to do?

AtulKumar4 commented 3 years ago

You can think of a lot many changes to improve the results. You can do the line item labeling the same way you do for other fields. That's the only information I can share as of now. In the future, I will public the code.

Karthik1904 commented 3 years ago

Thank you for the informaiton @AtulKumar4.