invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.84k stars 482 forks source link

Standard Field names? #311

Open Seriousness opened 3 years ago

Seriousness commented 3 years ago

I am using invoice2data for streamlining my invoice management and automatizing my books.

Thus I'd like to have other ppl some benefits from adding several invoice issuers and push these to this repo. Before I do that, I would like to ask if there has been set up any description of the fields you'd like to see besides the necessary items.

m3nu commented 3 years ago

Thanks for contributing some templates! You can check out the tutorial or look at existing templates in this repo.

We don't really have style rules for now, but if you're interested, this would be a valuable contribution as well.

Also note the recent commit by @rmilecki, which adds a more expressive YAML syntax and the option to add new parsers.

Seriousness commented 3 years ago

Ok, I will try to get some example working the very next week. Should I do it in Markdown as a simple readme, does it need an extra branch? I'd try to work the field names from the existing templates and expand that with explanation or fields that might be necessary from a german bookkeeping aspect etc.

m3nu commented 3 years ago

Personally I would make a new branch in my own fork while working on any new pull request. After it's done, we'll merge this feature branch into the master branch of the main repo.

Some of the field names are relatively "standard" and I'd keep them. You'll find them in other templates. For new fields, just choose any name that makes sense.

rmilecki commented 3 years ago

I'm all for documenting some standard field names! I tried starting discussion on field name for VAT summary lines in the #309 but it was somehow missed.

I noticed e.g. a lot of existing templates use vat field for VAT identification number. Personally I use vatin and I find it more accurate, but I guess we should prefer what's already used in this case.

Seriousness commented 3 years ago

i have done some research in the meantime and wondered if it was reasonable to adjust names with existing electronic invoice standards as Zugferd?

rmilecki commented 3 years ago

Following some existing standard would be great. Is there anything more international than Zugferd?

Let me post a merge request which what I already got queued while working on the #309.

m3nu commented 3 years ago

Zugferd is mainly about how the invoice metadata is embedded. The actual data structure is specified in CII or UBL. For field names either one could be used.

See here: https://invoice-x.github.io/standards/

rmilecki commented 3 years ago

Thanks @m3nu!

It seems there is UE directive 2014/55/UE also called "eInvoicing Directive". Resources on eInvoicing: https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eInvoicing . It indeed specifies CCI and UBL as supported syntaxes: https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/Required+syntaxes

I think standards are described in documented called EN 16931 but I'm failing to find actual documentation.

I found some Polish example of UBI formatted invoice and it looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2">
    <cbc:CustomizationID>urn:cen.eu:en16931:2017#compliant#urn:fdc:peppol.eu:2017:poacc:billing:3.0</cbc:CustomizationID>
    <cbc:ProfileID>urn:fdc:peppol.eu:2017:poacc:billing:01:1.0</cbc:ProfileID>
    <cbc:ID>NVOICE_PeF_1.0</cbc:ID>
    <cbc:IssueDate>2018-04-10</cbc:IssueDate>
    <cbc:DueDate>2018-04-23</cbc:DueDate>
    <cbc:InvoiceTypeCode>380</cbc:InvoiceTypeCode>
    <cbc:Note>Kontrakt podpisany poprzez stronę internetową</cbc:Note>
    <cbc:DocumentCurrencyCode>PLN</cbc:DocumentCurrencyCode>
    <cac:InvoicePeriod>
        <cbc:StartDate>2018-01-01</cbc:StartDate>
        <cbc:EndDate>2018-03-31</cbc:EndDate>
        <cbc:DescriptionCode>35</cbc:DescriptionCode>
    </cac:InvoicePeriod>
    <cac:OrderReference>
        <cbc:ID>Z123</cbc:ID>
    </cac:OrderReference>
    <cac:ContractDocumentReference>
        <cbc:ID>K571/2018</cbc:ID>
        <cbc:DocumentType>Kontract</cbc:DocumentType>
    </cac:ContractDocumentReference>
    <cac:AccountingSupplierParty>
        <cac:Party>
            <cbc:EndpointID schemeID="0088">5790989675432</cbc:EndpointID>
            <cac:PartyName>
                <cbc:Name>Empik</cbc:Name>
            </cac:PartyName>
            <cac:PostalAddress>
                <cbc:StreetName>Drużbickiego 2</cbc:StreetName>
                <cbc:CityName>Poznań</cbc:CityName>
                <cbc:PostalZone>61-693</cbc:PostalZone>
                <cac:Country>
                    <cbc:IdentificationCode>PL</cbc:IdentificationCode>
                </cac:Country>
            </cac:PostalAddress>
            <cac:PartyTaxScheme>
                <cbc:CompanyID>PL5260207427</cbc:CompanyID>
                <cac:TaxScheme>
                    <cbc:ID>VAT</cbc:ID>
                </cac:TaxScheme>
            </cac:PartyTaxScheme>
            <cac:PartyLegalEntity>
                <cbc:RegistrationName>Nazwa firmy</cbc:RegistrationName>
                <cbc:CompanyID>011518197</cbc:CompanyID>
            </cac:PartyLegalEntity>
            <cac:Contact>
                <cbc:ElectronicMail>office@empik.pl</cbc:ElectronicMail>
            </cac:Contact>
        </cac:Party>
    </cac:AccountingSupplierParty>
    <cac:AccountingCustomerParty>
        <cac:Party>
            <cbc:EndpointID schemeID="0088">5790000435975</cbc:EndpointID>
            <cac:PartyName>
                <cbc:Name>Instytut Logistyki i Magazynowania</cbc:Name>
            </cac:PartyName>
            <cac:PostalAddress>
                <cbc:StreetName>Estkowskiego 6</cbc:StreetName>
                <cbc:CityName>Poznań</cbc:CityName>
                <cbc:PostalZone>61-755</cbc:PostalZone>
                <cac:Country>
                    <cbc:IdentificationCode>PL</cbc:IdentificationCode>
                </cac:Country>
            </cac:PostalAddress>
            <cac:PartyLegalEntity>
                <cbc:RegistrationName>Nazwa firmy</cbc:RegistrationName>
                <cbc:CompanyID>540269750</cbc:CompanyID>
            </cac:PartyLegalEntity>
        </cac:Party>
    </cac:AccountingCustomerParty>
    <cac:PaymentMeans>
        <cbc:PaymentMeansCode name="Tekst opisowy">42</cbc:PaymentMeansCode>
        <cbc:PaymentID>Płatność1</cbc:PaymentID>
        <cac:PayeeFinancialAccount>
            <cbc:ID schemeID="LOCAL">39109013620000000036017908</cbc:ID>
        </cac:PayeeFinancialAccount>
    </cac:PaymentMeans>
    <cac:AllowanceCharge>
        <cbc:ChargeIndicator>true</cbc:ChargeIndicator>
        <cbc:AllowanceChargeReasonCode>ABL</cbc:AllowanceChargeReasonCode>
        <cbc:AllowanceChargeReason>Koszty pakowania</cbc:AllowanceChargeReason>
        <cbc:Amount currencyID="PLN">100.00</cbc:Amount>
        <cac:TaxCategory>
            <cbc:ID>S</cbc:ID>
            <cbc:Percent>8</cbc:Percent>
            <cac:TaxScheme>
                <cbc:ID>VAT</cbc:ID>
            </cac:TaxScheme>
        </cac:TaxCategory>
    </cac:AllowanceCharge>
    <cac:TaxTotal>
        <cbc:TaxAmount currencyID="PLN">40.00</cbc:TaxAmount>
        <cac:TaxSubtotal>
            <cbc:TaxableAmount currencyID="PLN">800.00</cbc:TaxableAmount>
            <cbc:TaxAmount currencyID="PLN">40.00</cbc:TaxAmount>
            <cac:TaxCategory>
                <cbc:ID>S</cbc:ID>
                <cbc:Percent>5</cbc:Percent>
                <cac:TaxScheme>
                    <cbc:ID>VAT</cbc:ID>
                </cac:TaxScheme>
            </cac:TaxCategory>
        </cac:TaxSubtotal>
    </cac:TaxTotal>
    <cac:LegalMonetaryTotal>
        <cbc:LineExtensionAmount currencyID="PLN">800.00</cbc:LineExtensionAmount>
        <cbc:TaxExclusiveAmount currencyID="PLN">900.00</cbc:TaxExclusiveAmount>
        <cbc:TaxInclusiveAmount currencyID="PLN">940.00</cbc:TaxInclusiveAmount>
        <cbc:ChargeTotalAmount currencyID="PLN">100.00</cbc:ChargeTotalAmount>
        <cbc:PayableAmount currencyID="PLN">940.00</cbc:PayableAmount>
    </cac:LegalMonetaryTotal>
    <cac:InvoiceLine>
        <cbc:ID>1</cbc:ID>
        <cbc:InvoicedQuantity unitCode="C62">1</cbc:InvoicedQuantity>
        <cbc:LineExtensionAmount currencyID="PLN">800.00</cbc:LineExtensionAmount>
        <cac:Item>
            <cbc:Description>Subskrypcja prasy w 1 kwartale 2018 r.</cbc:Description>
            <cbc:Name>Subskrypcja prasy</cbc:Name>
            <cac:ClassifiedTaxCategory>
                <cbc:ID>S</cbc:ID>
                <cbc:Percent>5</cbc:Percent>
                <cac:TaxScheme>
                    <cbc:ID>VAT</cbc:ID>
                </cac:TaxScheme>
            </cac:ClassifiedTaxCategory>
        </cac:Item>
        <cac:Price>
            <cbc:PriceAmount currencyID="PLN">800.00</cbc:PriceAmount>
        </cac:Price>
    </cac:InvoiceLine>
</Invoice>

After a quick look at that example I'm not so sure if we want to follow such a standard strictly at all.

How would that look like?

  1. Should we replace date in templates with IssueDate (or issue_date)?
  2. Should we replace currency in templates with DocumentCurrencyCode (or document_currency_code)?
  3. Should we start using nested fields to match it? I don't think I like it at all. Consider a messy standardized syntax like:
    fields:
    AccountingSupplierParty:
    parser: nested
    Party:
      parser: nested
      Contact:
        parser: nested
        ElectronicMail:
            parser: regex
            regex: ([^ ]@[^ ])
Seriousness commented 3 years ago

@m3nu Thank you for the clarification on the embedding / document standards.

@rmilecki I was less thinking in creating fully valid documents, my idea was more to take the naming of the fields and look into optional fields. My idea is that this could allow to streamline further processing when e-invoicing becomes more standard. Looking into the posted example I actually prefer issue_date or document_currency_code though its names might cause longer code: its specific!

But I actually leave that up to you, so either way I go for the existing vars and sum them up, or try to find equivalents? I could also transition that, so end up in defining a 1.0 with the roadmap to get to EN 16931 in 2.0? I hope I dont make this too complicated, I am just thinking things through on my mind.

rmilecki commented 3 years ago

@Seriousness: OK, I like the idea of getting some ideas from UBL for field names where it makes sense. Without being too strict about it. I'm afraid some names may not be well applicable. Using CompanyID for VAT identification number (without extra context) sounds quite misleading.

@Seriousness: can you take a look at #313 and comment on it?

Seriousness commented 3 years ago

I've seen and answered there.

Up to now, this is the best overview with description I could find on UBL: https://docs.peppol.eu/poacc/billing/3.0/syntax/ubl-invoice/tree/ - as spoken on the other, I am not sure if the ubl names are really fitting the everyday workflow or just blow things up.

bosd commented 2 years ago

I'm all for standard field names. However, we need to consider compatibility to not break implementations. Like used in https://github.com/OCA/edi/tree/14.0/account_invoice_import_invoice2data