Watch timed out when setting timeout as None

tobegit3hub commented 7 years ago

We have use python-etcd for leader election. All the workers will watch the same key in etcd and try to elect the leader after the key dismissed.

Now we try to watch the key and set the timeout as None.

self.client.watch(self.master_key, timeout=None)

But after almost one minute, the salve worker throws timeout exception and exit.

DEBUG:etcd.client:Watch timed out.
Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/lib64/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/usr/lib64/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib64/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib64/python2.7/site-packages/django/core/management/base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "/home/work/cloud-ml/restful_server/cloud_ml/management/commands/run_queue_consumer.py", line 743, in handle
    etcdLeaderElection.wait_to_become_master()
  File "/home/work/cloud-ml/restful_server/utils/leader_election.py", line 26, in wait_to_become_master
    self.client.watch(self.master_key, timeout=None)
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 736, in watch
    recursive=recursive)
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 562, in read
    timeout=timeout)
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 840, in wrapper
    cause=e
etcd.EtcdWatchTimedOut: Watch timed out: ReadTimeoutError("HTTPConnectionPool(host='10.105.17.85', port=2379): Read timed out.",)

If we set the timeout as 3600, it will be much better and will not exit soon. But that's not what we want. Not sure if it's the bug of python-etcd to watch the key forever.

self.client.watch(self.master_key, timeout=3600)

tobegit3hub commented 7 years ago

Try to workaround and set timeout like this 😞

import sys
self.client.watch(self.master_key, timeout=sys.maxint)

lavagetto commented 7 years ago

hi @tobegit3hub I most definitely never had this problem. More specifically, I have a etcd replication tool running in production that watches etcd for hours when I don't set any timeout.

Even my local tests never showed such a behaviour.

So I am at a loss: which version of python/urrlib3/etcd are you using?

tobegit3hub commented 7 years ago

Thanks @lavagetto .

It's easy to re-produced in my hosts with CentOS 7.0, python 2.7.5, urllib3 1.19.1 and etcd 3.0.15 git sha: fc00305.

And using timeout=sys.maxint will work for us.

lavagetto commented 7 years ago

@tobegit3hub sorry, I just realized that setting no timeout means the default urllib3 read timeout will be enforced.

You should explicitly set the timeout to 0 here:

c = etcd.Client(port=2379)
c.read('/', wait=True) #will cause the timeout error
c.read('/', wait=True, timeout=0) #will wait forever

tobegit3hub commented 7 years ago

That makes sense and 0 works for me.

Thanks @lavagetto very much!

jplana / python-etcd

Watch timed out when setting timeout as None #227