threefoldtecharchive / 0-robot

Distributed live cycle management system
Apache License 2.0
0 stars 0 forks source link

creating service concurrently can create race condition where some service secrets are not saved in the config #39

Closed 0xIslamTaha closed 6 years ago

0xIslamTaha commented 6 years ago

Description Trying to deploy 5 s3 services using the same robot client in parallel using multiprocess. All s3 services have been deployed well but If I try to do robot.servise.names, I only get 1 or 2 services while it should be more than that.

Related code https://gist.github.com/islamTaha12/f1a600efa5a13d7821fef202459ec2f6

Repo Development

zaibon commented 6 years ago

what I can do is protect the code that append secret to the config. But I will not protect it for multiprocessing. I can make it work for threading/gevent though.

zaibon commented 6 years ago

This bug is actually more hard to fix then I thought. So seems there is indeed a possibility of race condition during the creation/deletion of a service when we update the configuration of the robot client to add the secret service to it. https://github.com/threefoldtech/0-robot/blob/20ab3ef50cc4050851df5d1604408dcef1cfd805/zerorobot/dsl/ZeroRobotManager.py#L45-L51 And https://github.com/threefoldtech/0-robot/blob/20ab3ef50cc4050851df5d1604408dcef1cfd805/zerorobot/service_proxy.py#L103-L115

The problem is that the jumpscale config manager always return a new instance of the client, so I can't even really protect these section with a lock since I don't have a global lock for all client.

So it seems the config manager should provide an atomic way of updating the configuration in order to solve this problem.

If this is not an option, then we'll have to find another way to store the service secret received after we create a service. Bug that would be quite annoying since quite a lot have been around the fact that you have all the info you need to reach your services in the client configuration itself.