Conversion issue in linux

wooio / htmltopdf-java

An HTML to PDF conversion library written in Java, based on wkhtmltopdf.

MIT License

173 stars 97 forks source link

Conversion issue in linux #18

Open rnachireddy opened 5 years ago

rnachireddy commented 5 years ago

Hello, I am trying to use htmltopdf to convert html content to pdf. HtmlToPdfResult.pdf

The API is working in windows environment, however while running linux getting corrupted pdf means pdf generated with junk data(attached the resultant pdf)

Here is the piece of code :: String processedHtml = "html content"; InputStream pdfStream = HtmlToPdf.create().object(HtmlToPdfObject.forHtml(processedHtml)).convert();

    System.out.println("encoded pdf ::"+encodeStreamToBase64(pdfStream));

After decoded the above pdf stream getting corrupted pdf in linux.

benbarkay commented 5 years ago

It's likely a font/encoding problem by the looks of it. Perhaps the Linux machine is missing some fonts? Or the character set which the content is written in?

It would help to have something reproducible for this. If possible, could you provide an html document that produces this on a Linux machine?

rnachireddy commented 5 years ago

Hello Ben, Thanks for your quick response. Attached the html file for testing.

Below is the piece of code I am using. //Load html content as String public static String loadFile() throws Exception{ String location = "installmentLetter.html"; InputStream is = new FileInputStream(new File(location)); return IOUtils.toString(is); }

//converting html string to Base64 encode format String encodedHtmlContent = Base64.encodeBase64String(htmlContent.getBytes());

//HTML to PDF conversion using htmltopdf api InputStream pdfStream = HtmlToPdf.create().object(HtmlToPdfObject.forHtml(encodedHtmlContent )).convert();

//convert pdf to encoded base64

public static String encodeStreamToBase64(InputStream pdfStream ) throws IOException { byte[] bytes = IOUtils.toByteArray(inputStream); String encodedString = Base64.encodeBase64String(bytes); return encodedString; }

While decode the generated encoded pdf getting the corrupted pdf which I attached in my earlier thread. htmlTestData.zip

benbarkay commented 5 years ago

The following works fine on my Linux machine:

    @Test
    public void test() throws IOException {
        String html = Files.readAllLines(Paths.get("installementLetter.html"))
                .stream()
                .collect(Collectors.joining("\n"));

        HtmlToPdf.create()
                .object(HtmlToPdfObject.forHtml(html))
                .convert("installmentLetter.pdf");
    }

It produces installmentLetter.pdf which looks ok.

The code that you've shared shows that you encode the HTML to Base64 before using it to create an HtmlToPdfObject, which is odd because in that case none of the HTML structure should have been preserved like it is in the PDF document that you've uploaded.

Perhaps you could try playing around with a wkhtmltopdf binary (version 0.12.4) in your linux machine to see if it still happens with that. I think it's a good idea to get environment issues out of the way before we're trying to solve this perhaps non-existent defect in the code (be it yours or the library's) :)

rnachireddy commented 5 years ago

Hi Ben, Still getting same issue even I passed the raw html content(without encoding) to htmltopdf. The only difference from your above code is, I am not generating physical pdf. Using convert() method which returning InpuStream, this stream further encode to Base64 as I need to transfer this content to another system.

Thanks, Rajesh

linxuebo commented 2 years ago

请问引入mave依赖后，打成jar部署到linux还需要在linux服务器上安装libwkhtmltox插件吗？我没安装插件，在linux上运行就报“Unable to load library '/tmp/io.woo.htmltopdf/wkhtmltox/0.12.5/libwkhtmltox.so'”这样的错误