ukwa / w3act

w3act is an annotation and curation tool for building web archive collections
Apache License 2.0
19 stars 6 forks source link

Possible threadlock issue #684

Closed anjackson closed 1 year ago

anjackson commented 2 years ago

Last week on Friday, we had a couple of mysterious lock-ups, where W3ACT stopped responding to requests while JG was using the Document Harvester.

The first time, W3ACT itself was restarted with a little more RAM. This worked at first, but then locked up a couple of hours later, again while JG was using it. JGs comments implied the pdf-to-html back-end service was not working during this time (and had not been restarted)

During the second outage, the service itself was not busy, and not using much RAM, etc. Connecting with VisualVM and JConsole did not reveal and deadlocks, or GC issue (ran a GC just fine). Restarting the back-end services, even the DB, did not affect it. Restarting it, it came back fine again, but has not been used in anger since. Note that having restarted the pdf-to-html service, document preview was also working fine.

Reading around, I found that the way we're using W3ACT to proxy some calls maybe the problem, especially if blocking calls are being made on the main application thread pool (see e.g.). So, if the real problem was that the pdf-to-html service had died/frozen, then the issue could be that the proxied calls to the document renderer are locking up the application threads, eventually consuming them all and meaning incoming requests are not dealt with.

If this is the case, upping the threadpool would buy a little more time, but not resolve the issue:

Given we can now authenticate within NGINX, a better solution would be to cut out the internal W3ACT proxy and use the rendering service directly.

anjackson commented 1 year ago

This is superseded as we now manage authenticated proxy requests using NGINX.