openlabs / docker-wkhtmltopdf-aas

wkhtmltopdf in a docker container as a web service.
BSD 3-Clause "New" or "Revised" License
99 stars 94 forks source link

Tested for concurrency? #11

Open dtoso-skymesh opened 9 years ago

dtoso-skymesh commented 9 years ago

In the past running multiple copies of wkhtmltopdf concurrently had issues; there were threading problems and some named-pipes were at the same fs path across multiple wkhtmltopdf processes. (I found out after an ugly incident involving customers getting other's PDFs in a parallelized batch run).

wkhtmltopdf seems to have had many version bumps since then, but nothing I've read from the commits screams out that this issue has been fixed.

In openlabs/docker-wkhtmltopdf-aas the gunicorn WSGI daemon seems to fork on request, so if the concurrency issue still exists in wkhtmltopdf then this service exports the problem to the service's users.

In my use case, I needed to substitute the version of wkhtmltopdf shipped with openlabs/docker-wkhtmltopdf-aas with a staticly linked copy of wkhtmltopdf 0.10.0rc2 because the PDF output from identical HTML had changed over the years due to webkit html rendering fixes. (I have legacy HTML that would be a massive PITA to change).

As I know at least my version of wkhtmltopdf (0.10.0rc2) has concurrency issues, I'm treating docker as an isolation mechanism rather than simply a deployment helper. I have 20 identical containers running with a home-made HTTP load-balancing proxy sitting in front of them. It hands off (unmodifed) requests to available containers and makes subsequent requests wait until workers become available (by simply blocking on the HTTP response).

sharoonthomas commented 9 years ago

Testing the returned content in PDF is a PITA. Any ideas on how a test with concurrency could be done ?

dtoso-skymesh commented 9 years ago

I wrote a perl script (call it 'single.pl') that:

Comparison done through this pipleline:

pdftotext - - | grep <MD5>

Then I wrote another perl script (call it 'bench.pl') to fork 5 children, where each child executes single.pl 20 times with a randomised Time::HiRes::usleep in between requests. I log the commandline and the result of the grep out to a file and then grep that for mismatches.

sharoonthomas commented 9 years ago

@dtoso-skymesh :+1: thank you

alicpr commented 3 years ago

We are going to use this on enterprise scale which will perform 100 req/s on each server. Is the issue still exists? Does any alternative solution available?

dtoso-skymesh commented 3 years ago

@alicpr not sure if @sharoonthomas has fixed this issue, but I worked around it by running many docker containers each running wkhtmltopdf-aas. The solution was to only send one request at a time to each container.

If you've got a fast enough machine(s) you could just launch (docker run) them on demand from, say, a CGI script.

On our hardware that wasn't fast enough so I came up with an HTTP-proxy based solution. Basically it does:

I've found the limiting factor to be the server hardware.