Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
Hi,
Thanks for your wonderful codes! I am embedding your works in my application which shows PDFs' contents page by page in GUI.
I met the problem that all images and fonts in the generated html can not be displayed by Firefox and Javafx WebView, meanwhile they work in Chrome.
I am sure there is something wrong about the resources' path cause I tried following paths and the first 2 ones do not work while the last 2 ones work for FireFox and my app:
<img src="D:\tmp/JEEFC.book.png"/><img src="D:\\tmp\\JEEFC.book.png"/><img src="./JEEFC.book.png"/><img src="JEEFC.book.png" />
Both IE 9 and Firefox require font files to be served from the same domain as the page they are loaded into
So I tried following and it works for FireFox and Javafx WebView
<img src="file:///D:\tmp/JEEFC.book.png"/>
To view the html generated by PDF2DOM well, I extend your "SaveResourceToDirHandler.java".
If its variables were "protected" instead of "private", my class would be simple like this:
public class PDFResourceToDirHandler extends SaveResourceToDirHandler {
@Override
public String handleResource(HtmlResource resource) throws IOException {
return "file:///" + super.handleResource(resource);
}
}
Now it has to copy all lines of your "SaveResourceToDirHandler.java" like this:
public class PDFResourceToDirHandler extends SaveResourceToDirHandler {
private final File directory;
private final List<String> writtenFileNames = new LinkedList<>();
public PDFResourceToDirHandler(File directory) {
this.directory = directory;
}
@Override
public String handleResource(HtmlResource resource) throws IOException {
String dir = DEFAULT_RESOURCE_DIR;
if (directory != null) {
dir = directory.getPath() + "/";
}
String fileName = findNextUnusedFileName(resource.getName());
String resourcePath = dir + fileName + "." + resource.getFileEnding();
File file = new File(resourcePath);
FileUtils.writeByteArrayToFile(file, resource.getData());
writtenFileNames.add(fileName);
return "file:///" + resourcePath;
}
private String findNextUnusedFileName(String fileName) {
int i = 1;
String usedName = fileName;
while (writtenFileNames.contains(usedName)) {
usedName = fileName + i;
i++;
}
return usedName;
}
}
Hi, Thanks for your wonderful codes! I am embedding your works in my application which shows PDFs' contents page by page in GUI. I met the problem that all images and fonts in the generated html can not be displayed by Firefox and Javafx WebView, meanwhile they work in Chrome. I am sure there is something wrong about the resources' path cause I tried following paths and the first 2 ones do not work while the last 2 ones work for FireFox and my app:
<img src="D:\tmp/JEEFC.book.png"/>
<img src="D:\\tmp\\JEEFC.book.png"/>
<img src="./JEEFC.book.png"/>
<img src="JEEFC.book.png" />
I found following link: https://stackoverflow.com/questions/11812111/font-face-url-pointing-to-local-file?r=SearchResults And following lines remind me it is domain issue:
So I tried following and it works for FireFox and Javafx WebView
<img src="file:///D:\tmp/JEEFC.book.png"/>
To view the html generated by PDF2DOM well, I extend your "SaveResourceToDirHandler.java". If its variables were "protected" instead of "private", my class would be simple like this:
Now it has to copy all lines of your "SaveResourceToDirHandler.java" like this: