naiveHobo / InvoiceNet

Deep neural network to extract intelligent information from invoice documents.
MIT License
2.48k stars 392 forks source link

Annotation tool? Suggestions... #23

Open desaetiis opened 4 years ago

desaetiis commented 4 years ago

Any plans to integrate with/build an annotation tool to help create the json training data? Any suggestions on what to use? Doccanno comes to mind. What have you used to label your train/test invoices?

desaetiis commented 4 years ago

Just wanted to add that this is a great project and thanks so much for putting it out there.

naiveHobo commented 4 years ago

Thank you so much!

I had a look at Doccanno and it looks great! Future plans do include a better annotation tool and Doccanno seems like a good option. However, I tried to write the main Extractor tool in such a way that it could also be used to make the annotation process a little less tedious. You can open multiple documents (even all documents in a directory) into the extractor tool and cycle through each document such that it gets displayed in the main viewer display. The Save Information button allows you to save the extracted information in the same format as the JSON labels that the trainer tool expects. The idea here was that you could train a model on a small amount of data first and then use this premature model to help you in annotating new documents.

desaetiis commented 4 years ago

"The idea here was that you could train a model on a small amount of data first and then use this premature model to help you in annotating new documents."

Intuitively, this is exactly what I ended up doing... Kudos... Well thought.

Speaking of Doccanno - the architecture of it is very portable/scalable in that it is a web/browser app and one can dockerize it. Not sure how that would be beneficial, other than ease of installation and/or sharing if hosted in the cloud.

Is this just a side project for you or is this used, or will it be used in production somewhere?