horstoeko / zugferd

ZUGFeRD/XRechnung/Factur-X Library
MIT License
141 stars 25 forks source link

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

Closed oschildt closed 3 weeks ago

oschildt commented 4 weeks ago

Describe the feature

I have implemented 2 new useful methods in ZugferdDocumentPdfReader for extraction of the attached XML.

It is useful for checking whether the software issuing the invoice has placed all information to the XML, which is available in the PDF. We faced the situations that a software places some data into the PDF but not into the XML of the Zugferd invoice.

class ZugferdDocumentPdfReader
{
    public static function extractXMLFromFromFile(string $pdfFilename) : ?string
    {
        if (!file_exists($pdfFilename)) {
            throw new ZugferdFileNotFoundException($pdfFilename);
        }

        $pdfContent = file_get_contents($pdfFilename);

        if ($pdfContent === false) {
            throw new ZugferdFileNotReadableException($pdfFilename);
        }

        return static::extractXMLFromContent($pdfContent);
    }

    public static function extractXMLFromContent(string $pdfContent) : ?string
    {
        $pdfParser = new PdfParser();
        $pdfParsed = $pdfParser->parseContent($pdfContent);
        $filespecs = $pdfParsed->getObjectsByType('Filespec');

        $attachmentFound = false;
        $attachmentIndex = 0;
        $embeddedFileIndex = 0;
        $returnValue = null;

        try {
            foreach ($filespecs as $filespec) {
                $filespecDetails = $filespec->getDetails();
                if (in_array($filespecDetails['F'], static::ATTACHMENT_FILENAMES)) {
                    $attachmentFound = true;
                    break;
                }
                $attachmentIndex++;
            }

            if (true == $attachmentFound) {
                /**
                 * @var array<\Smalot\PdfParser\PDFObject>
                 */
                $embeddedFiles = $pdfParsed->getObjectsByType('EmbeddedFile');
                foreach ($embeddedFiles as $embeddedFile) {
                    if ($attachmentIndex == $embeddedFileIndex) {
                        $returnValue = $embeddedFile->getContent();
                        break;
                    }
                    $embeddedFileIndex++;
                }
            }
        } catch (\Exception $e) {
            $returnValue = null;
        }

        return $returnValue;
    }
}
horstoeko commented 4 weeks ago

HI @oschildt,

Many thanks for the issue. I'll take a look at it. Please check again whether your code suggestion fits into the current implementation. I had already adjusted something for you in this regard.

Best regards

oschildt commented 4 weeks ago

Hi,

We have document archiving system. It gathers the documents from many sources - FTP, LAN, Emails, direct upload etc.

We have document viewer and E-Rechnung Viewer. For the Zugferd invoices we enable viewing also the embedded XML.

It is useful for checking whether the software issuing the Zugferd invoice has placed all information to the XML, which is available in the PDF. We faced the situations that a software places some data into the PDF but not into the XML of the Zugferd invoice.

Best regards

Oleg

horstoeko commented 4 weeks ago

Hi @oschildt,

Would it perhaps be possible for you to provide a PullRequest based on the current implementation?

oschildt commented 4 weeks ago

Hi,

I can do it but later.

horstoeko commented 4 weeks ago

Hi @oschildt,

many thanks for that. My time is extremely limited at the moment. It's important to remember that I'm doing the project exclusively privately - I also have a regular job, which unfortunately keeps me very busy at the moment.

Please don't forget to write tests for your implementation if necessary. Unfortunately, this is very often forgotten... :-)

Best regards

horstoeko commented 3 weeks ago

Hi @oschildt,

i implemented some additional methods. Please have a look at this Pull Request and give me feedback.

Kind regards

oschildt commented 3 weeks ago

Hi,

I see you have implemented the XML extraction and also test units. I have tested it, it works perfect.

Regards,

------ Original Message ------ From "horstoeko" @.> To "horstoeko/zugferd" @.> Cc "Oleg Schildt" @.>; "Mention" @.> Date 24.09.2024 05:23:32 Subject Re: [horstoeko/zugferd] [FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML (Issue #120)

Hi @oschildt https://github.com/oschildt,

i implemented some additional methods. Please have a look at this commt https://github.com/horstoeko/zugferd/commit/10fc7d33e01954d24d601c9d5e782ea93dd97dd3 and give me feedback.

Kind regards

— Reply to this email directly, view it on GitHub https://github.com/horstoeko/zugferd/issues/120#issuecomment-2370033831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA74NQHNMNCMDUSJ6BJPRKTZYDLLJAVCNFSM6AAAAABORUNI2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGAZTGOBTGE. You are receiving this because you were mentioned.Message ID: @.***>

horstoeko commented 3 weeks ago

HI @oschildt,

Nice to hear from you and thank you for your response I will make a release next days.

Kind regards