smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.42k stars 538 forks source link

Call to undefined method Smalot\PdfParser\Header::__toString() #391

Closed dhildreth closed 3 years ago

dhildreth commented 3 years ago

My application is throwing the error Call to undefined method Smalot\PdfParser\Header::__toString() in .../Smalot/PdfParser/Font.php Line 107 for version 0.18.1 (Laravel 7).

   public function translateChar($char, $use_default = true)
    {
        $dec = hexdec(bin2hex($char));

        if (\array_key_exists($dec, $this->table)) {
            return $this->table[$dec];
        }

        // fallback for decoding single-byte ANSI characters that are not in the lookup table
        $fallbackDecoded = $char;
        if (
            \strlen($char) < 2
            && $this->has('Encoding')
            && WinAnsiEncoding::class === $this->get('Encoding')->__toString()
        ) {
            $fallbackDecoded = self::uchr($dec);
        }

        return $use_default ? self::MISSING : $fallbackDecoded;
    }

I also attempted to use dev-master as well and ended up with:

  Call to undefined method Smalot\PdfParser\Header::__toString()

  at vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:109
    105|         $fallbackDecoded = $char;
    106|         if (
    107|             \strlen($char) < 2
    108|             && $this->has('Encoding')
  > 109|             && WinAnsiEncoding::class === $this->get('Encoding')->__toString()
    110|         ) {
    111|             $fallbackDecoded = self::uchr($dec);
    112|         }
    113| 

      +5 vendor frames 
  6   app/Console/Commands/Indexing/IndexAssetsCommand.php:109
      Smalot\PdfParser\Document::getText()

      +10 vendor frames 
  17  app/Console/Commands/Indexing/IndexAllCommand.php:67
      Illuminate\Console\Command::call()

Adding && $this->get('Encoding') instanceof Encoding fixes the issue:

    105|         $fallbackDecoded = $char;
    106|         if (
    107|             \strlen($char) < 2
    108|             && $this->has('Encoding')
  + 109|             && $this->get('Encoding') instanceof Encoding
    110|             && WinAnsiEncoding::class === $this->get('Encoding')->__toString()
    111|         ) {
    112|             $fallbackDecoded = self::uchr($dec);
    113|         }
    114| 

I had seen this fix somewhere else on this issue tracker, but I can't seem to find it now. I feel the worse part of this is that it doesn't throw an exception. Or, at least, I can't seem to catch it using try{ } catch (\Exception $e) { }. But, it should be resolved either way.

I think it's choking on this PDF: https://cdn.embeddedarm.com/resource-attachments/Okaya_800nit_7inch_RS800480T-7X0WHP-A.pdf

k00ni commented 3 years ago

Thank you for reporting this. Can we use the PDF you posted as part of our tests? And is it free of charge and with no obligations?

k00ni commented 3 years ago

Can you test with #384 and tell us if it fixes your problem?

dhildreth commented 3 years ago

Yes, you may download and use the PDF for your tests. There aren't any restrictions on it. 😉

Yes, #384 fixes it. Thank you! Are there any instructions for me on how to get it updated in my composer.json? Will there be a new tag or do I need to use dev-master or... ?

k00ni commented 3 years ago

Thank you for your feedback. I created another PR (#393) which contains the fix from #384 and a basic test (using your PDF). After it was reviewed I will merge it and prepare a new release soon.

k00ni commented 3 years ago

@dhildreth new release v0.18.2 is out with this fix.

dhildreth commented 3 years ago

Excellent! Thank you.

dhildreth commented 3 years ago

Confirmed fixed. :-)