pauln / tcpdi_parser

Parser for use with TCPDI, based on TCPDF_PARSER
GNU Lesser General Public License v3.0
27 stars 47 forks source link

Fix Illegal string offset warnings #23

Open tommynovember7 opened 6 years ago

tommynovember7 commented 6 years ago

If a pdf has such an object like the followings, tcpdf_parser fails to continue parsing the data. getRawObject() is expected to return an array which contains an object and its offset, but it currently returns an object without its offset if the pdf has % comments. It causes Illegal string offset warnings.

PDF Object Sample

2 0 obj
<< /Type /Page % 1
   /Parent 1 0 R
   /MediaBox [ 0 0 839.314286 1186.971429 ]
   /Contents 4 0 R
   /Group <<
      /Type /Group
      /S /Transparency
      /I true
      /CS /DeviceRGB
   >>
   /Resources 3 0 R
>>

Warning Example

Warning: Illegal string offset 'Parqaj' in /Users/lancelot/Sandbox/PHP/tcpdi_parser/tcpdi_parser/tcpdi_parser.php on line 712
PHP Warning:  Illegal string offset 'Parqak' in /Users/lancelot/Sandbox/PHP/tcpdi_parser/tcpdi_parser/tcpdi_parser.php on line 712
...
jerry-cbn commented 1 year ago

Hi !

It's not enough.

Function getDictValue() is also bugged and do the same bug at same line.

For example, consider this PDF data sample :

<<
/Type /XObject
/Width 93
/Filter [/FlateDecode]
/Height 93
/Length 1930
/Subtype /Image
/ColorSpace [/Indexed /DeviceRGB 255 (   \b\b\b\n\n\n\r\r\r\t\t\tÔ ³½ µ» ·¹°¿ ´¼>>>???888!!!&&&\(\(\(###%%%===555$$$///999333\)\)\)""";;;666111''':::222444\)1ÿ3;ÿ9?ÿ?Wø9@ÿ>Fÿ;]ô>Eÿ?Fÿ?Vø:Aÿ5fï2iìDDDEEELLLNNNTTT[[[ZZZHHHMMMQQQGGGCCC___VVVYYYFFFBBBWWWIIISSSFMÿ@GÿENþGLÿX_ÿEMÿTZÿDLÿaaacccqqqooommmfffkkkjjjggg}}}hhhddduuuwwwsssnnniii~~~rrrpppbbbyyy{{{````fÿ`eÿgmÿ›››œœœ˜˜˜ƒƒƒšššŒŒŒ’’’‚‚‚†††ŽŽŽ•••———–––‡‡‡“““‰‰‰ŸŸŸ”””‹‹‹†ÿƒˆÿ›žÿž¢ÿ   ¿¿¿½½½³³³···¦¦¦ªªª¾¾¾©©©¡¡¡°°°¥¥¥¨¨¨¶¶¶µµµ¤¤¤´´´±±±²²²¼¼¼®®®¯¯¯¸¸¸­°ÿª­ÿ¶¸ÿ¸ºÿ¼¿ÿ©¬ÿ¸»ÿª®ÿÇÇÇÆÆÆ×××ÉÉÉØØØÙÙÙÝÝÝÛÛÛÓÓÓÁÁÁÖÖÖÊÊÊÜÜÜÒÒÒÌÌÌÔÔÔÚÚÚËËËÀÀÀÂÂÂÍÍÍÈÈÈÅÅÅÄÄÄÑÑÑÛÜÿùùùõõõøøøôôôòòòóóóüüüýýýìììúúú÷÷÷éééêêêûûûöööçççëëëþþþíííññÿýýÿäääåçÿüýÿåæÿîîîàààñññèèèÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ)]
/BitsPerComponent 8
>>

The function will not work properly because string ">>" is present inside the last crochets (crochets of ColorSpace)

Suggested modification :

    private function getDictValue($offset, &$data) {
        $objval = array();

        // Extract dict from data
        $i = 1;
        $dict = "";
        $offset+= 2;
        $is_bracket = false;
        do {
            if ($data[$offset] == "[") {
                $is_bracket = true;
                $dict.= $data[$offset];
            } else if ($data[$offset] == "]") {
                $is_bracket = false;
                $dict.= $data[$offset];
            } else if (!$is_bracket && ($data[$offset] == "<") && ($data[$offset + 1] == "<")) {
                $i++;
                $dict.= "<<";
                $offset++;
            } else if (!$is_bracket && ($data[$offset] == ">") && ($data[$offset + 1] == ">")) {
                $i--;
                $dict.= ">>";
                $offset++;
            } else {
                $dict.= $data[$offset];
            }
            $offset++;
        } while ($i > 0);