Open benbarkay opened 6 years ago
@benbarkay any updates?
@lgabeskiria it appears that there aren't many straightforward options. The idea that I'm currently examining is to load multiple separate wkhtmltopdf.so
, using RTLD_LOCAL
flag. However, I was not able to get JNA to load the library's dependencies with RTLD_LOCAL
, so this might take a while or might not be reasonably possible.
Other projects that I've encountered who use wkhtmltpdf as a shared library seemed to have stopped at synchronization (what 1.0.4 currently does) rather than supporting any concurrency. That makes supporting concurrency a very attractive goal, but it also means that perhaps making that happen is impractical.
You can have look at this project https://github.com/rdvojmoc/DinkToPdf
@lgabeskiria they do not support concurrency. They are thread safe, though (in a similar fashion to 1.0.4)
Hi @benbarkay ,
I have one doubt, how to replicate the case of concurrency issue. I have tried the following code and it is working fine. It is able to create all the PDFs. Please suggest, how can I replicate the case. I am using Windows 10 Pro operating system and Java 8.
import java.io.BufferedReader;
import java.io.File;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import io.woo.htmltopdf.HtmlToPdf;
import io.woo.htmltopdf.HtmlToPdfObject;
public class PDFToHTMLUsingLib
{
public static void main(String[] args) throws MalformedURLException, InterruptedException {
boolean isProcessBased = false;
String pdfFilePath = "D:/htmltopdf/multithread/";
ArrayList<HashMap<String, String>> list = getDataList(pdfFilePath);
PDFToHTMLUsingLib obj = new PDFToHTMLUsingLib();
obj.startAllThreads(list, isProcessBased);
System.out.println("All THreads execution is started..");
}
public void startAllThreads(List<HashMap<String, String>> hashList, boolean isProcessBased) throws InterruptedException {
for ( HashMap<String, String> eachMap : hashList) {
Thread th = new Thread(new MyThread(eachMap.get("HTML_PATH"), eachMap.get("PDF_PATH"), isProcessBased));
th.start();
//th.join();
}
}
class MyThread implements Runnable {
private String htmlFilePath = "";
private String outputFilePath = "";
private boolean isProcessedBased = false;
public MyThread(String htmlPath, String pdfPath, boolean isProcessBased) {
this.htmlFilePath = htmlPath;
this.outputFilePath = pdfPath;
this.isProcessedBased = isProcessBased;
}
@Override
public void run()
{
if (this.isProcessedBased) {
createPDFUsingProcess();
} else {
createPDFUsingLib();
}
}
private void createPDFUsingProcess()
{
try {
String threadName = Thread.currentThread().getName();
System.out.println("[" + threadName + "] Started the execution to generate PDF using Process.");
ProcessBuilder processBuilder = new ProcessBuilder("wkhtmltopdf", htmlFilePath, outputFilePath);
processBuilder.redirectErrorStream(true);
Process process = processBuilder.start();
BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null)
System.out.println("[" + threadName + "] Process Output: " + line);
process.waitFor();
System.out.println("[" + threadName + "] Execution is Completed.");
} catch (Exception ex) {
ex.printStackTrace();
}
}
private void createPDFUsingLib()
{
String threadName = Thread.currentThread().getName();
System.out.println("[" + threadName + "] Started the execution to generate PDF using Lib.");
File file = new File(outputFilePath);
boolean result = HtmlToPdf.create()
.object(HtmlToPdfObject.forUrl(htmlFilePath))
.convert(file.getPath());
System.out.println("[" + threadName + "] Is converted.. " + result);
}
}
private static ArrayList<HashMap<String, String>> getDataList(String pdfFilePath)
{
ArrayList<HashMap<String, String>> list = new ArrayList<HashMap<String, String>>();
HashMap<String, String> eachMap = null;
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://github.com/jglick/jkillthread");
eachMap.put("PDF_PATH", pdfFilePath + "jkillthread.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://developer.paypal.com/docs/api/invoicing/v1/");
eachMap.put("PDF_PATH", pdfFilePath + "paypalinvoice.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://www.class-central.com/report/mooc-mba-top-b-schools/");
eachMap.put("PDF_PATH", pdfFilePath + "mooc_mba.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://developers.google.com/machine-learning/crash-course/prereqs-and-prework#prerequisites");
eachMap.put("PDF_PATH", pdfFilePath + "ml.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-expressions-and-variables/cc-6th-evaluating-expressions/v/expression-terms-factors-and-coefficients");
eachMap.put("PDF_PATH", pdfFilePath + "kacademy.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://machinelearningmastery.com/machine-learning-in-python-step-by-step/");
eachMap.put("PDF_PATH", pdfFilePath + "mlstepbystep.pdf");
list.add(eachMap);
eachMap = new HashMap<String, String>();
eachMap.put("HTML_PATH", "https://www.class-central.com/report/mooc-mba-top-b-schools/");
eachMap.put("PDF_PATH", pdfFilePath + "new_mooc_mba.pdf");
list.add(eachMap);
return list;
}
}
https://wkhtmltopdf.org/libwkhtmltox/
These binding are well documented and do not depend on QT. Using this is the recommended way of interfacing with the PDF portion of libwkhtmltox
@ymohammad your code indeed should create PDFs without any issues, but are you sure it's done in parallel really? I mean, does it 4 times faster in 4-cores processor with 4 threads to create a hundreds of random PDFs than just with 1 thread? That's the idea. Are you saying there is no difference between createPDFUsingProcess
and createPDFUsingProcess
in performance?
If the library has one synchronized block, it will be a bottleneck and the actual conversion will be sequential (and 3 of 4 threads will starve).
@owexroasia it is very strange if it doesn't depend on QT, since it uses qt-webkit to render HTML, am I right?..
@benbarkay do you have any news? Am I right that the only current way to do it in parallel is to actually run wkhtmltopdf
in separate processes (which some other libraries without JNA do)? Only because this is how it is implemented inside native library itself?
Currently, all requests to
wkhtmltopdf
are synchronized to a single thread, thus it is not possible to execute conversions concurrently.