podofo / podofo

A C++17 PDF manipulation library
https://podofo.github.io/podofo/documentation
Other
374 stars 78 forks source link

Uncaptable stack overflow exception thrown by pdfpaser #183

Closed zqk-7k closed 2 months ago

zqk-7k commented 3 months ago

Hello, thank you for your outstanding work, which has provided me with great convenience.

I have a PDF file here,This file is quite old and was created by PDF XChange 3.3 in 2007. while loading it could cause a stack overflow. The file is as follows: Cp.pdf

When parsing the file in podofo, there will be a mutual call between PoDoFo::PdfParser::ReadXRefContents(PoDoFo::inputStreamDevice & device, unsigned int64 offset, bool positionAtEnd) and PoDoFo::PdfParser::readNextTrailer(PoDoFo::InputStreamDevice & device), and the call stack is as follows: 企业微信截图_17224175482095 企业微信截图_17224178093142

Can Podofo support reading this type of file? Or at least it can avoid crashing when encountering such files.

zqk-7k commented 3 months ago

I tested the file using the following code

PdfMemDocument doc;
    try
    {
        doc.Load(Cp.pdf);
    }
    catch (PdfError e)
    {
        std::cout << "catch PdfError"<< std::endl;
    }
    catch (const std::exception e)
    {
        std::cout << "catch std::exceptin" << std::endl;
        return;
    }
    catch (...)
    {
        std::cout << "catch ..." << std::endl;
    }

Use debug to gradually execute, exceptions will appear in utls:: RecursionGuard guard Thrown in method utls::RecursionGuard::Enter(),

void utls::RecursionGuard::Enter()
{
    s_recursionDepth++;
    if (s_recursionDepth > s_MaxRecursionDepth)
    {
        // avoid stack overflow on documents that have circular cross references, loops
        // or very deeply nested structures, can happen with
        // /Prev entries in trailer and XRef streams (possible via a chain of entries with a loop)
        // /Kids entries that loop back to self or parent
        // deeply nested Dictionary or Array objects (possible with lots of [[[[[[[[]]]]]]]] brackets)
        // mutually recursive loops involving several objects are possible
        PODOFO_RAISE_ERROR_INFO(PdfErrorCode::InvalidXRef, "Stack overflow");
    }
}

then after PoDoFo::PdfParser::ReadXRefContents(PoDoFo::inputStreamDevice & device, unsigned int64 offset, bool positionAtEnd)and PoDoFo::PdfParser::readNextTrailer(PoDoFo::InputStreamDevice & device) The program will execute to the 'catch (...)' of the main function, and then click to execute the next line of code, causing the program to crash.

I don't understand why the exception thrown by pdfpaser has already been caught by the main function's trycatch, but still caused the program to crash.

In addition, when I modified the const expr unsigned MaxRecursionDepthDefault=128 in PdfDeclarationsPPrivate. cpp, make MaxRecursionDepthDefault=16, the thrown exception can be correctly caught.

ceztko commented 3 months ago

Hello, I noticed that with MaxRecursionDepthDefault=412 that file parses correctly, which means our hardcode limit may be too low (but it really depends on maximum stack size). The crash may be related to the custom call stack gathering mechanism in PdfError, as noticed by someone in the mailing list[1]. My idea to solve that is to just remove the custom callstack handling in PoDoFo, as in modern C++ one would just recover the stack trace with system facilities. I want to have a look at this before 1.0 , but don't expect me to fix it quickly.

[1] https://www.mail-archive.com/podofo-users@lists.sourceforge.net/msg04971.html

ceztko commented 2 months ago

Fixed in https://github.com/podofo/podofo/commit/aeac3229b06c599641b624c2c5bd344cac7e859f

ceztko commented 2 months ago

Forgot: fixed also in https://github.com/podofo/podofo/commit/31e0139113f5860cc35d59bbe5c2844a1a37ec6c .

ceztko commented 1 month ago

Merged in 0.10.x