Closed jshorland closed 7 years ago
Hrm. The queue is huge again and seems stalled.
I've purged the job queue, and increased the number of workers slightly which might help.. However this seems to be a running issue. Brainstorming solutions with @willdoran
We have a recurring issue with the .io dataproviders queue backing up.. I think its just not processing jobs as fast as the come in. My initials thought are:
..
Once we have laravel, and rewrite data providers.. we could make deployments use a shared queue properly and throw away rabbitmq. It gets much better once we have just 1 db.. but even before that we can simplify. So optimizing rabbitmq right now will get thrown away. But moving things to laravel (in cloud interface) will last longer.. similarly tweak the dataproviders in platform is probably more worthwhile.
This might be temporarily resolved.. but it needs a permanent fix. I was testing some improvements to the celery task but its friday afternoon so I think I'll have to reconfirm on Monday
Ah.. so I did see the queue just from 0 to 10k a few times. I realized this is because the dataprovider.generate task goes into the same queue.. if the queue is overloaded then celerybeat queues up multiple generate tasks. When they finally fun they queue up WAY more tasks thus overloading the queue some more.
We can reduce the frequency we run at, or we need some rate limiting mechanism on celerybeat I think
@tuxpiper could you reduce the frequency of the dataprovider task for now and purge the dataprovider queue? that should keep this functioning till Monday
Done that. Let's hope it holds over the weekend.
Deploying the change to remove ansible from dataproviders task. Will check how that speeds up the process.
Dropping ansible takes us back to a 15min run time.. still need to optimize I think.
How about adding a command mode to the ushahidi
command line tool? That would cut down the repeated work of forking php processes and bootstrapping the platform every time. i.e.:
$ ./bin/ushahidi command
stdin< { "DB_HOST"="...", "DB_NAME"="...", "DB_USER"="...", "DB_PASSWORD"="...", "command": "dataprovider incoming" }
stdout> { "result": "OK", "output": "..." }
stdin< < { "DB_HOST"="...", "DB_NAME"="...", "DB_USER"="...", "DB_PASSWORD"="...", "command": "dataprovider outgoing" }
stdout> { "result": "OK", "output": "..." }
Just not sure how hard it is to reconfigure the database connection on the fly.. within the same process.
I'm still not receiving any SMS alerts from usaelectionmonitor.ushahidi.io, even though I'm set up to do so. I believe we can figure out this issue when we rewrite the datasource integrations as laid out in #697 -- a Q2 OKR non-negotiable.
just noting here, we are not lagging that much behind on job execution now .. in the precise case of that deployment, it must be a configuration issue with the integration of the SMS gateway, or the gateway itself.
I'm not receiving SMS alerts from usaelectionmonitor.ushahidi.io
I'm assuming this is happening everywhere