Open IzzyHibbert opened 4 years ago
I'm not completely sure which fields you're referring to. Do you have a sample image you can display in this ticket?
In general however, it should be possible to train a model for any field in a document as long as it's not a field which can have multiple occurrences. You can train a model even for fields that have multiple occurrences, but you will only be able to use one of the occurrences as the true label and the final extraction of such a model would also only be able to extract a single occurrence for this field.
Thank you. You answered me already. I meant multiple occurrences such as the purchased articles described in the invoice, or "line items". You normally find more than one therefore you have Item1, Item2, Item3, and so on..
They typically are represented with a similar vertical and horizontal alignment.
Any chance that this is going to be included in the future or any idea how to start to develop in this direction ?
Thanks
Hi @IzzyHibbert , you can try this API dedicated to invoices https://scandocflow.com
It seems InvoiceNet cant handle the tables for example.
How can we extract the items from the table as the criteria of using the custom field take only a single key-value pair?
@mirfan899, @IzzyHibbert have you found a solution!?
Nope. Use something else like yolo. I did solve the issue using Yolo3.
@mirfan899 That's great. Can you provide a link to the repository?
@mirfan899 Thank you. I'm not sure how yolo will extract invoice data though. Did you write your custom network?
I labeled the dateset. Here are the results using yolo and then train a yolo v3 model.
cant you at least use the OPTIONAL data type for small lists?
@mirfan899 Thanks for sharing. Could you use Yolo to extract the line item details from the table? For example, if you want to extract payment history lines from your last photo i.e. something like:-
[{"Month": "Dec 2021", "HM3": 0.622, "Current Bill": 433.36, ...},
{"Month": "Nov 2021", "HM3": 0.387, ...}]
, is it possible? As far as I understood, neither InvoiceNet nor Yolo can do that.
Why not. Yolo can solve the table issue. Just label the table and after detection use ocr to extract text.
I guess he's aiming for extracting formatted line items with labels not just text. Extracting text using ocr from the table will just give you some text.
Thank you both for your quick replies Yes I wanted them to be formatted so that I know which text corresponds to which column. I will need to store these extracted data and process them depending on their columns.
I have done similar to this. You need to label columns with yolo. Detect and OCR. You need more data to get better accuracy. Around 50 samples of a single template.
Can you show a sample of how you labeled the columns with yolo to detect single line items? I'm also interested in this.
Like this.
Can you provide a repository with sample code?
Hi guys
I was wondering if fields represented in table (like the line items fields) are supported. If Yes, how to set them up ? If Not, that would really be a nice to have.