Tasks can hang indefinitely if app encounters a critical error

The data extract for the full Coronavirus dataset seems to have gotten hung up sometime after March 25, probably either when the shared /storage drive ran out of space, or when the server had to be restarted after a network outage. TweetSets read the task as still processing, although no files were being produced. In order to restart the task, it's necessary to delete the pertinent folder in /storage/full_datasets.

We need a way to recover gracefully from such errors.

If we continue using Celery, look at the call to _generate_tasks.AsyncResult(task_id), which was returning a "Pending" status even in the absence of a viable task.

If we are able to use Spark for extracts, consider exposing the Spark jobs UI from the container (for monitoring and disabling of jobs).

gwu-libraries / TweetSets

Tasks can hang indefinitely if app encounters a critical error #121