naiveHobo / InvoiceNet

Deep neural network to extract intelligent information from invoice documents.
MIT License
2.42k stars 386 forks source link

Want to prepare a dataset #96

Open test2a opened 2 years ago

test2a commented 2 years ago


I am very interested in this project and I would like to contribute. I can prepare a dataset because I deal with all sorts of hand written and printed invoices. I have a couple of questions. If the maintainer or someone who has actually made use of this project,

  1. Can I get a sample of an invoice that you put in the dataset

  2. Do I need to do some additional work besides preparing scanned copies

  3. Is there support for multiple page invoices

  4. Are handwritten invoices supported

  5. What about PDF copies of invoices ?

  6. Do I need to find different invoice formats with 1-2 samples or do you need more samples of each ? Say 10-20?

  7. What is a suitable size of a dataset ? 100?1000 invoices?

  8. What if I print/ PDF same invoice in multiple formats? Will that work?

frank60229 commented 2 years ago
  1. handwritten invoices seem not supported.
  2. The format of the datasets is supposed to be a PDF file with a JSON file.