Closed gshif closed 5 years ago
Added the following logic
conn_cmd = 'kubectl --context=%s -n %s logs storage-0 -c storage' \
' | grep "Connected to broker at kafka-svc:9093"' % (options.kubecluster, options.namespace)
conn = subprocess.Popen(conn_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# wait for the command to complete
conn.wait()
# if storage is connected to kafka, the command will return status 0, otherwise > 1
if conn.poll() is None or conn.poll() > 0:
print str(datetime.datetime.now()) + ' ' + str(conn.communicate())
print str(datetime.datetime.now()) + ' STORAGE IS RESTARTING.\n'
# restart storage and then restart queryd
cmd_command = 'curl --max-time 20 -s -GET %s/health' % options.storage
status = check_service_status(service=options.storage, cmd_command=cmd_command, time_delay=180, time_sleep=2,
restart=True, pod='storage-0')
services_status['storage'] = status
# need to restart queryd to make sure it is connected to storage (should be fixed)
# get the name pf the queryd pod:
queryd_pod = subprocess.Popen('kubectl --context=%s get pods -n %s -l app=queryd-a | grep queryd | '
'awk \'{ print $1 }\'' % (options.kubecluster, options.namespace), shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
queryd_pod.wait()
out, error = queryd_pod.communicate()
cmd_command = 'curl --max-time 20 -s -GET %s/health' % options.flux
status = check_service_status(service=options.flux, cmd_command=cmd_command, time_delay=180, time_sleep=2,
restart=True, pod=out.strip())
services_status['queryd'] = status
The above logic was added to litmus_run_master.py
Storage and queryd components are being restarted even though storage relies on its init container to wait until kafka/etcd services are up and running. It was noticed (and there is an existing issue) that once storage won't connect to kafka. To make tests reliable, storage is forced to be restarted then. It brings another issue: It might take a few minutes to restart a storage and then queryd components, that would increase the test run twice. In order to avoid unnecessary restarts (storage is connected kafka and queryd is connected to storage) need to make restarts conditional - if connection is ok, then do not restart, otherwise - restart