Closed odscjames closed 1 year ago
I've had a look again and I will estimate 4 days - we have used celery and this background worker pattern before so I am more confident we won't find too many surprises here
Closing this issue following completion of work listed in https://github.com/openownership/cove-bods/pull/97
Split out from #69
Motivation: Users uploading large files don't see the app time out and crash.
In https://github.com/openownership/cove-bods/issues/69#issuecomment-964223981 , the idea of the app "timing out" after trying to process for a certain time was raised.
A timeout approach would be difficult as currently processing a web request is single threaded. In order to "time out" nicely we'd have to change that to a multi threaded approach and make sure the main thread could stop the worker thread safely. We'd also have to impose a limit that was short enough such that almost all people would not get impatient and hit refresh, maybe 10 or 20 seconds. This limits the file size that can be uploaded and processed. Also the maximum size would depend on how effective the server was at the time, which would vary and thus be hard to calculate. What I'm about to suggest may seem like more work than the timeout suggestion, but I think the timeout suggestion is actually more difficult to do well than it may seem on first glance and a timeout gets us less in terms of functionality and future options.
Instead I'd suggest a process where we add a message queue and a background worker.
Any small files uploaded would be processed at once, using the same mechanism as before.
Any large files would be saved to disk and DB, a message sent through the queue to a worker and the user shown a loading page.
There are several advantages to this:
To do this we would:
Add celery library to process messages through queue.
Add RabbitMQ server.
Add background worker that processes files and saves results to disk. This is basically a wrapper around libcovebods so is not too complex.
When new data sent to "/" process it as before (save to disk and db then redirect user to "/data/XXX" page), but if the file is above a certain size also send a message to the worker.
When user looks at "/data/XXX" page, change process to:
An issue here is that currently the home page of the tool comes from libcoveweb, which is shared with other tools. We would have to have a way to extend the library from the cove app itself.