Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
I have a PDF that results in a NullPointerException in PDFBoxTree.java on line 391 when trying to convert it. The reason is that the font variable is null.
The fix is to include a check for null before checking the font type.
for (COSName key : resources.getFontNames())
{
PDFont font = resources.getFont(key);
if (null != font) {
if (font instanceof PDTrueTypeFont)
{
table.addEntry( font);
log.debug("Font: " + font.getName() + " TTF");
}
else if (font instanceof PDType0Font)
{
PDCIDFont descendantFont = ((PDType0Font) font).getDescendantFont();
if (descendantFont instanceof PDCIDFontType2)
table.addEntry(font);
else
log.warn(fontNotSupportedMessage, font.getName(), font.getClass().getSimpleName());
}
else if (font instanceof PDType1CFont)
table.addEntry(font);
else
log.warn(fontNotSupportedMessage, font.getName(), font.getClass().getSimpleName());
}
}
I have a PDF that results in a NullPointerException in PDFBoxTree.java on line 391 when trying to convert it. The reason is that the font variable is null.
The fix is to include a check for null before checking the font type.