Let's make sure the task is robust with respect to restarting or killing the analytics server(s). If you kill the process mid-stream, and then restart the servers, will it recover gracefully? Restarting the computation from the point it left off?
We should think through the following cases:
Hitting "process all users" when the computation has already started.
Killing analytics process, restarting it, sending "process all users" signal again
Killing web server, restarting it
Also --- what happens after the first analytics pass has finished? Does it start over a day later? Is there a timer job? It should restart at least once a day.
Let's make sure the task is robust with respect to restarting or killing the analytics server(s). If you kill the process mid-stream, and then restart the servers, will it recover gracefully? Restarting the computation from the point it left off?
We should think through the following cases:
Hitting "process all users" when the computation has already started.
Killing analytics process, restarting it, sending "process all users" signal again
Killing web server, restarting it
Also --- what happens after the first analytics pass has finished? Does it start over a day later? Is there a timer job? It should restart at least once a day.