invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.79k stars 476 forks source link

Issues with other templates on [Windows] #566

Open abhigyanasatpathy opened 2 weeks ago

abhigyanasatpathy commented 2 weeks ago

Steps to add new template

To add a new template, we recommend this workflow:

1. Copy existing template to new file

Find a template that is roughly similar to what you need and copy it to a new file. It's good practice to use reverse domain notation. E.g. country.company.division.language.yml or fr.mobile.enterprise.french.yml. Language is not always needed. Template folder are searched recursively for files ending in .yml.

2. Change invoice issuer

Just used in the output. Best to use the company name.

3. Set keyword

Look at the invoice and find the best identifying string. Tax number + company name are good options. Remember, all keywords need to be found for the template to be used.

Keywords are compared before processing the extracted text.

4. First test run

Now we're ready to see how far we are off. Run invoice2data with the following debug command to see if your keywords match and how much work is needed for dates, etc.

invoice2data --template-folder tpl --debug invoice-XXX.pdf

This test run shows you how the program will "see" the text in the invoice. Parsing PDFs is sometimes a bit unpredictable. Also make sure your template is used. You should already receive some data from static fields or currencies.

5. Add regular expressions

Now you can use the debugging text to add regex fields for the information you need. It's a good idea to copy parts of the text directly from the debug output and then replace the dynamic parts with regex. Keep in mind that some characters need escaping. To test, re-run the above command.

6. Done

Now you're ready to commit and push your template, so others get a chance to use and improve it.

My Question: I have added new template in yml with regex accordingly but when i am parsing that invoice pdf it is not parsing showing error .

Error message: (invoice2data-env) D:\invoice2data-master\src\invoice2data>invoice2data --output-format csv --output-name output/invoices.csv input/demoinvoice.pdf ←[94mINFO:←[0minvoice2data.extract.loader:←[94m Loaded 189 templates from D:\invoice2data-master\invoice2data-env\Lib\site-packages\invoice2data\extract\templates←[0m ←[94mINFO:←[0mpikepdf._core:←[94m pikepdf C++ to Python logger bridge initialized←[0m Scanning contents ---------------------------------------- 100% 1/1 0:00:00 ←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m This PDF is marked as a Tagged PDF. This often indicates that the PDF was generated from an office document and does not need OCR. PDF pages processed by OCRmyPDF may not be tagged correctly.←[0m OCR ---------------------------------------- 0% 0/1 -:--:--←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m Weighted average image DPI is 152.1, max DPI is 247.7. The discrepancy may indicate a high detail region on this page, but could also indicate a problem with the input PDF file. Page image will be rendered at 400.0 DPI.←[0m OCR ---------------------------------------- 100% 1/1 0:00:00 Linearizing ---------------------------------------- 100% 100/100 0:00:00 ←[94mINFO:←[0minvoice2data.input.ocrmypdf:←[94m Text extraction made with ocrmypdf←[0m ←[1;41mERROR:←[0mroot:←[1;41m No template for input/demoinvoice.pdf←[0m

bosd commented 2 weeks ago

Hi, Your steps for adding a template are correct.

Did you verify your installation of invoice2data is running properly, by testing I on one of the example files?

abhigyanasatpathy commented 2 weeks ago

Yes it is running properly. Thank you for cooperating me. Btw can you please tell me the process again? I have created templates/myinvoice and inside it in.myinvoice.yml and regex according to my pdf . So is that the process enough to convert my pdf to csv in output? Or any other process or code i need to add , please tell me simply? I have already run your existing template working fine.

bosd commented 2 weeks ago

Your invoked command seems ok.

Some debugging steps [x] Verify your installation and parsing of sample file. [ ] Run with --debug flag to check the output of the invoice-xx.pdf file. This likely is the problem. As invoice2data trys to fall back on ocrmypdf. Which is likley due to the fact that it cannot detect characters with pdftotext.

Is your pdf file a text based file? or does it need ocr? [ ] Try your pdf with different input parser --input-reader= then use pdftotext or ocrmypdf [ ] Check your template for syntax errors

abhigyanasatpathy commented 2 weeks ago

My pdf file is text based file. I have only created one file in.invoicedemo.yml (path: D:\invoice2data-master\src\invoice2data\extract\templates\in\in.invoicedemo.yml) as step-1 Should i proceed only with this process step-1 or any other steps i should follow? Is there any other steps where i need to code or whatever else?

So in in.invoicedemo.yml file i have woked on regex expressions and keywords according to my pdf .

bosd commented 2 weeks ago

When you run invoice2data on the pdf file with the --debug flag, do you see the contents of the file in your logger/terminal?

abhigyanasatpathy commented 2 weeks ago

No , i cannot see contents of the file. I can see only pdf to text data in logger (using --debug flag) But i cannot see data in csv file . Getting error in logger: ♀←[0m DEBUG:←[0mroot: END pdftotext result =============================←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.opal.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.telstra.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.ibis.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.novotel.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.boucherie.pochet.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.cebeo.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.eg_retail.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.facture-dacompte.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.factuur.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.regularisation.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.melchior-vins.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.proximus.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.scarlet.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.securex.social.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: ch.pcengines.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invo . . .DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.bmw-fs.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-gt.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-nexo.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.orlen.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.p4.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.paypro.yml | Failed to match all keywords.←[0m ←[94mINFO:←[0mpikepdf._core:←[94m pikepdf C++ to Python logger bridge initialized←[0m DEBUG:←[0mroot: Text extraction failed, falling back to ocrmypdf←[0m DEBUG:←[0mroot: Text extraction failed, falling back to ocrmypdf←[0m DEBUG:←[0minvoice2data.input.ocrmypdf: input_reader_config received from main are, {}←[0m DEBUG:←[0minvoice2data.input.ocrmypdf: ocrmypdf config settings are: {'redo_ocr': True, 'optimize': 0, 'output_type': 'pdf', 'fast_web_view': 0}←[0m

←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m This PDF is marked as a Tagged PDF. This often indicates that the PDF was generated from an office document and does not need OCR. PDF pages processed by OCRmyPDF may not be tagged correctly.←[0m OCR ---------------------------------------- 0% 0/1 -:--:--←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m Weighted average image DPI is 152.1, max DPI is 247.7. The discrepancy may indicate a high detail region on this page, but could also indicate a problem with the input PDF file. Page image will be rendered at 400.0 DPI.←[0m OCR ---------------------------------------- 100% 1/1 0:00:00 Linearizing ---------------------------------------- 100% 100/100 0:00:00 ←[94mINFO:←[0minvoice2data.input.ocrmypdf:←[94m Text extraction made with ocrmypdf←[0m DEBUG:←

bosd commented 2 weeks ago

The result from pdftotext is empty.

So you're likely running into dependency issues from pdftotext / poppler utils on windows. Currently windows is not well supported and tested.

There is an open pr to enhance support. But tests are failling. https://github.com/invoice-x/invoice2data/pull/565

I'm a linux user. So cannot give you a lot of support on windows.

abhigyanasatpathy commented 2 weeks ago

But existing templates are working fine . I am not able to extract my pdf data.

There is one file : path: D:\invoice2data-master\invoice2data-env\Lib\site-packages\invoice2data-0.4.5.dist-info\RECORD should i need to do anything with this file for new templates? or i need to just create templates?

bosd commented 2 weeks ago

Just creating the templates should be fine.

Let's check if the template you have created has been loaded.

Do you see your template in the list of loaded templates?

abhigyanasatpathy commented 2 weeks ago

Loaded templates meaning ? -- D:\invoice2data-master\src\invoice2data\extract\templates\in\in.demovoice.yml -- this one i can see..

But not able to see here: DEBUG:←[0mroot: END pdftotext result =============================←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.opal.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.telstra.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.ibis.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.novotel.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.boucherie.pochet.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.cebeo.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.eg_retail.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.facture-dacompte.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.factuur.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.regularisation.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.melchior-vins.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.proximus.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.scarlet.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.securex.social.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: ch.pcengines.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.AzureInterior.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.amazon.aws.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.apple.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.apps4rent.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.binarylife.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.bloomberg.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.cloudns.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.datadoghq.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.digitalocean.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.envato.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.expressvpn.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.expressvpn_prio6.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ftserussell.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.github.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.globalsign.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.google.adwords.hk.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.hobohost.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.jamiepro.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.linode.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoftonline.hk-v2017.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoftonline.hk.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.mongodb.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.namecheap.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.namesilo.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.newrelic.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nl.lenovo.digitalriver.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nmmn.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nodisto.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nyse.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.oyo.invoice.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.packtpub.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.pixartprinting.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.sammymaystone.yml | Keywords matched. No exclude keywords found.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.scaleway.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.textmaster.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.tmx.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.travis-ci.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.de.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.uk.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.upwork.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.usersnap.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.amazon.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.bettina-kast.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.digikey.com.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.hosteurope.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.notebooksbilligerBillPay.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.ovh.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.qualityhosting.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.united-domains.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.pepephone.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: es.supplies24.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: co.mooncard.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.adobe.ie.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.akretion.fr.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.amazon.aws.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ateliercopieservice.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.chauffeur-prive.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.coriolis.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.easyjet.fr.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.eaudugrandlyon.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.godaddy.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.google.ie.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.hootsuite.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.jeanbesson.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ldlc.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.linkedin.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.mention.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoft.ie.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.myflyingbox.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.officetimeline.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.orange-business.mobile.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ovh.fr.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.rs-online.fr.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.saur.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.soyoustart.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.vinci-autoroutes.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: dolibarr.generique.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: eu.trainline.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.actn.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.airfrance.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.also.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.amazon.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.assurance-epargne-pension.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.bouyguestelecom.adsl-fiber.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.bouyguestelecom.mobile.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.butagaz.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.chronopost.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.dirafi.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.domaine-achat.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.easytrip.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.edf.entreprises.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.edf.pme.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.finagaz.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.fountain.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.adsl-fiber.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.mobile.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.mobile2.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.futur.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.ge-iroise.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.greffe-tc-lyon.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.hiscox.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.internetsatellite.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.jpg.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.kubii.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.laposte.boutique.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.laposte.coliposte.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.lecab.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.leroymerlin.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.maaf.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mediapart.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.moneo-resto.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mouser.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mycelium-roulement.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.napsis.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.nexity.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.orange.fibre.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.orange.fixedline.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.prestaclic.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.publicationannoncelegale.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sfr.adsl-fiber.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sfr.mobile.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sosh.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.teledec.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.topoffice.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: net.online.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: net.scaleway.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.action.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.albron.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.anwb.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.be.coolblue.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.begra.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.blokker.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bouwmans.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bp.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bunq.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.cpe.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.esso_eg_services.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.esso_eg_services_v2.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.farnell.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ferbox.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.gamma.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.goos.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.gulf.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ipparking.paleiskwartier.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.karwei.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.kav.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.koffiehenk.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.momentsenmore.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ns.invoice.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ok.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.parkmobile.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.praxis.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.reclameland.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.saeco.philips.eluscious.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.shell_nederland.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.shell_schellenkens.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.simpel.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.total_express.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.total_ototol.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.transip.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.tuynder.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.vistaprint.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.vodafone.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.wasco.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.weid.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.yezzer.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.zinkunie.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.bmw-fs.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-gt.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-nexo.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.orlen.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.p4.yml | Failed to match all keywords.←[0m DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.paypro.yml | Failed to match all keywords.←[0m

Why? so asked i just only created yml file and my regex inside template folder ..

So is there anything i need to follow up ?

bosd commented 2 weeks ago

Why?

Because you need to check if the template you have created is properly loaded.

Check if your pointing to the correct folder. (You can disable the built in templates with the following flag to reduce the noise: --exclude-built-in-templates)

You should see your template in that list. If your template is correct is should say that the keywords have matched.. followed by a.. using template <your template file>

abhigyanasatpathy commented 2 weeks ago

Even after i deleted my templates still it is parsing existing pdf . How's it possible? Deactivated again activate it though.

bosd commented 2 weeks ago

You have to verify if your template is being loaded.

  1. Are you pointing to the correct folder?
  2. Is your custom template loaded? Or does the debugger show that there is an error in your template?
  3. Is your template selected? Do the keywords match?
abhigyanasatpathy commented 2 weeks ago

Are you pointing to the correct folder? -- yes Is your custom template loaded? Or does the debugger show that there is an error in your template? yes error showing Is your template selected? Do the keywords match? yes checking

But not able to understand when i deleted existing templates for my test purpose, still its working , so i have doubt how is it possible? From where it is matching keywords it should show that yml file not available but still showing after deleting (for my test purpose)

bosd commented 2 weeks ago

\ But not able to understand when i deleted existing templates for my test purpose, still its working , so i have doubt how is it possible?

That sounds like a folder issue.

Maybe it is installed in different versions or locations.

What is the path which shows when you do 'pip show invoice2data'?

Is that the same location as where you where deleting the files?

abhigyanasatpathy commented 2 weeks ago

Screenshot 2024-08-30 003538

abhigyanasatpathy commented 2 weeks ago

My template location path is : D:\invoice2data-master\src\invoice2data\extract\templates Is it okay?

bosd commented 2 weeks ago

No, because your standard templates are loaded from the directory in the screenshot.

For easy testing gi to that location and delete the standard templates there. Or add your own custom ones there.