threefoldtecharchive / 0-robot

Distributed live cycle management system
Apache License 2.0
0 stars 0 forks source link

Creating any service on bancadati returns a `Loop Exit` error #48

Closed serboctor closed 4 years ago

serboctor commented 5 years ago

Creating any service on some of the nodes on bancadati returns the following error:



/home/ser/code/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py in create(self, template_uid, service_name, data, public)
    171             raise ServiceCreateError(e['message'], err)
    172 
--> 173         return self._instantiate(new_service)
    174 
    175     def find_or_create(self, template_uid, service_name, data, public=False):

/home/ser/code/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py in _instantiate(self, data)
     48     def _instantiate(self, data):
     49         if hasattr(data, 'secret') and data.secret:
---> 50             self._client = config_mgr.append_secret(self._client.instance, data.secret)
     51             # force re-creation of the connection with new secret added in the Authorization header
     52             self._client._api = None

/home/ser/code/threefoldtech/0-robot/zerorobot/dsl/config_mgr.py in append_secret(self, instance, secret)
    114 
    115     def append_secret(self, instance, secret):
--> 116         return self._send_cmd(self._append_secret, [instance, secret])
    117 
    118     def remove_secret(self, instance, service_guid):

/home/ser/code/threefoldtech/0-robot/zerorobot/dsl/config_mgr.py in _send_cmd(self, func, args)
     36         resp_q = Queue(maxsize=1)
     37         self._queue.put((func, args, resp_q))
---> 38         resp = resp_q.get()
     39         if isinstance(resp, Exception):
     40             raise resp

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.get()

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.get()

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.__get_or_peek()

/usr/local/lib/python3.6/dist-packages/gevent/__waiter.cpython-36m-x86_64-linux-gnu.so in gevent.__waiter.Waiter.get()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

src/gevent/__greenlet_primitives.pxd in gevent.__greenlet_primitives._greenlet_switch()

LoopExit: This operation would block forever
    Hub: <Hub '' at 0x7fb3a798f6d8 epoll pending=0 ref=0 fileno=19 thread_ident=0x7fb3c2689740>
    Handles:
[]```
zaibon commented 5 years ago

from the stack trace I see this error happens client side. So the robot seems to be ok creating the service. But the client fails to properly update the robot client configuration with the new service secret.

@serboctor Can I have more information about the setup of the client when you got this error ? is this code run from a script, manually ? is there anything special I should now about the client, etc ... ?

serboctor commented 5 years ago

I wasnt running a script I am just creating a service from the shell. It started with John not being able to deploy s3, I could see from the logs that it kept trying on nodes, then fails and tries another node but the logs dont really show the error. So I created a zrobot client to one of the nodes I saw in the logs, tried to create a namespace, and got this error, then I tried to create a container and got this error.

zaibon commented 5 years ago

I added some test for the config_mgr. This allowed me to find a bug in the delete method, but nothing in the append_secret. I'll keep this issue open but lower the priority

0xIslamTaha commented 5 years ago

@zaibon I have the same issue too and it of course from the client side

In [20]: robot.services.create("github.com/threefoldtech/0-templates/zerotier_client/0.0.1", 'zt_token', data={'token':token})
---------------------------------------------------------------------------
LoopExit                                  Traceback (most recent call last)
/usr/local/bin/js_shell in <module>()
----> 1 robot.services.create("github.com/threefoldtech/0-templates/zerotier_client/0.0.1", 'zt_token', data={'token':token})

/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py in create(self, template_uid, service_name, data, public)
    171             raise ServiceCreateError(e['message'], err)
    172 
--> 173         return self._instantiate(new_service)
    174 
    175     def find_or_create(self, template_uid, service_name, data, public=False):

/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py in _instantiate(self, data)
     48     def _instantiate(self, data):
     49         if hasattr(data, 'secret') and data.secret:
---> 50             self._client = config_mgr.append_secret(self._client.instance, data.secret)
     51             # force re-creation of the connection with new secret added in the Authorization header
     52             self._client._api = None

/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/config_mgr.py in append_secret(self, instance, secret)
    114 
    115     def append_secret(self, instance, secret):
--> 116         return self._send_cmd(self._append_secret, [instance, secret])
    117 
    118     def remove_secret(self, instance, service_guid):

/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/config_mgr.py in _send_cmd(self, func, args)
     36         resp_q = Queue(maxsize=1)
     37         self._queue.put((func, args, resp_q))
---> 38         resp = resp_q.get()
     39         if isinstance(resp, Exception):
     40             raise resp

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.get()

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.get()

/usr/local/lib/python3.6/dist-packages/gevent/_queue.cpython-36m-x86_64-linux-gnu.so in gevent._queue.Queue.__get_or_peek()

/usr/local/lib/python3.6/dist-packages/gevent/__waiter.cpython-36m-x86_64-linux-gnu.so in gevent.__waiter.Waiter.get()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.6/dist-packages/gevent/__greenlet_primitives.cpython-36m-x86_64-linux-gnu.so in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch()

src/gevent/__greenlet_primitives.pxd in gevent.__greenlet_primitives._greenlet_switch()

LoopExit: This operation would block forever
    Hub: <Hub '' at 0x7efc9e1f0630 epoll pending=0 ref=0 fileno=16 thread_ident=0x7efcb22b8700>
    Handles:
[]

env JS_lib : c5edbc3b566af8f3519ccdaedf708e8591beac4f 0-robot : 0cd4ae557bdf47f529ce4368ea9becdeb9762f93

zaibon commented 5 years ago

I have an idea about this. This error happens on the client side, where the environment is not monkey patched. Maybe because of this the queuing system used in this part of the code doesn't behave the same and thus happens to block forever. I'll see if I can reproduce this