radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
179 stars 71 forks source link

Local resources are unavaliable for FireFox and WebView #38

Open Mararsh opened 5 years ago

Mararsh commented 5 years ago

Hi, Thanks for your wonderful codes! I am embedding your works in my application which shows PDFs' contents page by page in GUI. I met the problem that all images and fonts in the generated html can not be displayed by Firefox and Javafx WebView, meanwhile they work in Chrome. I am sure there is something wrong about the resources' path cause I tried following paths and the first 2 ones do not work while the last 2 ones work for FireFox and my app: <img src="D:\tmp/JEEFC.book.png"/> <img src="D:\\tmp\\JEEFC.book.png"/> <img src="./JEEFC.book.png"/> <img src="JEEFC.book.png" />

I found following link: https://stackoverflow.com/questions/11812111/font-face-url-pointing-to-local-file?r=SearchResults And following lines remind me it is domain issue:

Both IE 9 and Firefox require font files to be served from the same domain as the page they are loaded into

So I tried following and it works for FireFox and Javafx WebView <img src="file:///D:\tmp/JEEFC.book.png"/>

To view the html generated by PDF2DOM well, I extend your "SaveResourceToDirHandler.java". If its variables were "protected" instead of "private", my class would be simple like this:

public class PDFResourceToDirHandler extends SaveResourceToDirHandler {

     @Override
    public String handleResource(HtmlResource resource) throws IOException {
        return "file:///" + super.handleResource(resource);
    }

}

Now it has to copy all lines of your "SaveResourceToDirHandler.java" like this:

public class PDFResourceToDirHandler extends SaveResourceToDirHandler {

    private final File directory;
    private final List<String> writtenFileNames = new LinkedList<>();

    public PDFResourceToDirHandler(File directory) {
        this.directory = directory;
    }

    @Override
    public String handleResource(HtmlResource resource) throws IOException {
        String dir = DEFAULT_RESOURCE_DIR;
        if (directory != null) {
            dir = directory.getPath() + "/";
        }

        String fileName = findNextUnusedFileName(resource.getName());
        String resourcePath = dir + fileName + "." + resource.getFileEnding();

        File file = new File(resourcePath);
        FileUtils.writeByteArrayToFile(file, resource.getData());

        writtenFileNames.add(fileName);

        return "file:///" + resourcePath;
    }

    private String findNextUnusedFileName(String fileName) {
        int i = 1;
        String usedName = fileName;
        while (writtenFileNames.contains(usedName)) {
            usedName = fileName + i;
            i++;
        }

        return usedName;
    }

}
Mararsh commented 5 years ago

My app need copy the generated html to target path, so I modify the line as

return new File(dir).getName() + File.separator + fileName + "." + resource.getFileEnding();

Then relative path name is written in html which can be moved anywhere and do not worry breaking the reference.