mediacloud / backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
http://www.mediacloud.org
GNU Affero General Public License v3.0
281 stars 87 forks source link

increase retry time in add_to_queue #324

Open hroberts opened 6 years ago

hroberts commented 6 years ago

We are using the default retry strategy for add_to_queue, which only retries for a total of 0.4 seconds. We need to increase it a lot, to at least five minutes, so that restarting rabbit does not kill all of the jobs running on other servers. This has not been a problem in the past because all of our queueing jobs have been running on the same server as rabbitmq.

This looks like a pretty simple fix to me, but I'd rather you do it, @pypt since you are more familiar with this code.

I have added some retries around the add_to_queue calls in TM::Mine.pm for the time being just so that topics don't crash and send error reports to users when we restart rabbit.

hroberts commented 6 years ago

the specific problem I'm seeing is that if a pool worker tries to add something to a queue while rabbit is down, that add_to_queue and all future add_to_queues for that running process fail with an error message like this:

2018-08-28 09:17:28,196 MediaCloud.JobManager.Broker.RabbitMQ: Unable to declare queue 'MediaWords::Job::Facebook::FetchStoryStats': AMQP socket not connected at /home/mediacloud/.perlbrew/libs/perl-system@mediacloud/lib/perl5/MediaCloud/JobManager
/Broker/RabbitMQ.pm line 290.