threefoldtecharchive / jumpscaleX_core

Apache License 2.0
1 stars 6 forks source link

Threebot server fails after unsuccessful shutdown [development_threebotstart] #600

Closed waleedhammam closed 4 years ago

waleedhammam commented 4 years ago

How to reproduce

Screenshot from 2020-03-11 15-20-58

the problem here is 3bot server sets a key in redis called "threebot.starting" but if this key is not well cleaned it will cause the server to stuck in loading

zaibon commented 4 years ago

The key added in redis has a timeout. So it will eventually go away.

waleedhammam commented 4 years ago

@zaibon I see the redis key timeout is set to 120 sec, we only wait for 75 sec so we at least we should make them match,no ?

xmonader commented 4 years ago

-m flag when you start kosmos to overcome that.

waleedhammam commented 4 years ago

@xmonader the same with me, still have to wait a 120 seconds

Screenshot from 2020-03-12 10-00-03

waleedhammam commented 4 years ago

Also if the master process died..

we will get an error during retrieving the data cause bcdb redis server dead. it could be in myjobs, .. etc

Screenshot from 2020-03-12 13-34-14

waleedhammam commented 4 years ago

well it turns out the problem only happens when background=True

Screenshot from 2020-03-12 15-07-21

When we try to start 3bot server again when the key is still in redis, it thinks that the server is still running, so configmanager : startupcmds uses bcdb model client where bcdb redis socket is not up so leads to fail.

so suggesting running bcdb socket at early stage, and before each bcdb usage we check it's up to determine master state or not

waleedhammam commented 4 years ago

PR at: https://github.com/threefoldtech/jumpscaleX_core/pull/609

john-kheir commented 4 years ago

Verified