Closed igable closed 9 years ago
Is this bug still present? I took a quick look at the code and it looks like the only invocation of the amqp_send method is wrapped in an exception handler which catches all exceptions and will wait a while and then eventually retry.
https://github.com/hep-gc/shoal/blob/master/shoal-agent/shoal-agent#L173-L189
Is this still reproducable?
@consold can you comment on this. You can try replicating by editing /etc/resolv.conf
.
I spent some time trying to reproduce this by adding and editing entries in the local /etc/resolv.conf.
I found that i couldn't reproduce the error by adding a bad entry for the production shoal server, I could however get an error if i pointed the lookup server to a bad place:
Traceback (most recent call last): File "/usr/bin/shoal-agent", line 199, in <module> main() File "/usr/bin/shoal-agent", line 150, in main data['hostname'] = socket.gethostbyaddr(public_ip.values()[0])[0] socket.herror: [Errno 2] Host name lookup failure
However, I don't think this is really what the original error was about as the lookup server shouldn't be going down. Any more information about how to reproduce this error would be helpful.
Issue appears fixed. Reopen if anyone is able to reproduce this in the future.
During the most recent openstack upgrade at CERN the DNS entry for shoal.heprc.uvic.ca was unavailable for some period. We need to catch this error and just wait for the the DNS to be available.