invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.8k stars 476 forks source link

No format check in new syntax style templates (e.g. type date) #485

Open bosd opened 1 year ago

bosd commented 1 year ago

Currently there is no type check. I have to admit it is my own mistake. (although might be a common one). Which lead to a crash of my upstream application. I'm opening this ticket to open up a discussion about the prevention of these situations.

I was converting an template from the old syntax to the the new one.

(random example)

fields:
  date:
    - Rechnungsdatum\s+(\w+ \d+, \d{4})

to

fields:
  date:
    parser: regex
    regex: Rechnungsdatum\s+(\w+ \d+, \d{4})

Notice: I forgot to include the type definition.type: date

This made invoice2data sent out an string instead of a datetime object. The upstream application did'nt like this :angry:

I think this is a very common mistake to make.. Specially if your're quickly developing a template. Currently there was no way to notice this error up until the production environment.

Proposed Solution: