radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
175 stars 71 forks source link

NullPointerException in PDFBoxTree.java on line 391 - and fix #55

Open perlausten opened 2 years ago

perlausten commented 2 years ago

I have a PDF that results in a NullPointerException in PDFBoxTree.java on line 391 when trying to convert it. The reason is that the font variable is null.

The fix is to include a check for null before checking the font type.

for (COSName key : resources.getFontNames())
{
    PDFont font = resources.getFont(key);
    if (null != font) {
        if (font instanceof PDTrueTypeFont)
        {
            table.addEntry( font);
            log.debug("Font: " + font.getName() + " TTF");
        }
        else if (font instanceof PDType0Font)
        {
            PDCIDFont descendantFont = ((PDType0Font) font).getDescendantFont();
            if (descendantFont instanceof PDCIDFontType2)
                table.addEntry(font);
            else
                log.warn(fontNotSupportedMessage, font.getName(), font.getClass().getSimpleName());
        }
        else if (font instanceof PDType1CFont)
            table.addEntry(font);
        else
            log.warn(fontNotSupportedMessage, font.getName(), font.getClass().getSimpleName());
    }
}