invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.83k stars 479 forks source link

RegEx for Date #549

Closed stony007de closed 9 months ago

stony007de commented 9 months ago

Hi RegEx Experts, is there anyone who can give me an idea for my Problem. My Amazon Invoices comming with 2 kinds of invoicedates.

here the recognized OCR:

Rechnungsdatum 31. Januar 2024

and

Rechnungsdatum /Lieferdatum 24 Januar 2024

so it could be Rechnungsdatum or Rechnungsdatum /Lieferdatum my Template is currently

date: s+(?:Rechnungsdatum|Rechnungsdatum\s*/Lieferdatum).\s+(\d{2}.\s+(?:Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember).\d{4})

The Regex-tester says everything is fine an i find the Date.
invoice2data way -->


> CRITICAL:root: Invoice2data failed to process /opt/sz2xd/scan/AmazonBusinessRechnungsnr1Q6J-1G99-F39J.pdf. 
> Error message: Unable to parse required field(s): {'date'}
stony007de commented 9 months ago

Ok found:

date: Rechnungsdatum\s*Lieferdatum|Rechnungsdatum.\s*(\d{2}.\s+(?:Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember).\d{4})