invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.83k stars 479 forks source link

A way to see which template was selected for a parsed document #545

Closed CopperClover closed 9 months ago

CopperClover commented 10 months ago

I have a script that contains over 120 templates for different invoices/credit notes and when it comes to debugging it's difficult to see which template is being used for each of the documents being parsed. Is there maybe a function variable that I can use to see this that I can just print out?

bosd commented 10 months ago

Hi, I'm not sure I understand your question... But as in terms of debugging, when you invoke invoice2data with the --debug it will mention the used template in the logger..

Example: invoice2data AzureInterior.pdf --debug

image

In Blue it mentions the used template. Is this what you are looking for?

CopperClover commented 10 months ago

Hi @bosd!

Thank you for the reply.

I am not using the CLI method. I have a script that runs the process in a python program.

For example I use the following snippet in the script:

from invoice2data import extract_data
from invoice2data.extract.loader import read_templates

templates = read_templates("path/to/templates")
result = extract_data("path/to/invoice", templates=templates)

and this is where I was wondering if it was possible to have the script mention the template that it was using. When I print the result it prints the extracted data/error but it doesn't mention which template it uses. Am I only able to see which template is being used when debugging and not by calling a variable from the library or something similar?

bosd commented 10 months ago

Am I only able to see which template is being used when debugging and not by calling a variable from the library or something similar?

For now the answer is Yes. But, in the returned result there is the Issuer Tag. In many cases you can identify the used template by that. As a workaround, for this particular use case, you could implement a static value inside your template which contains the template name.