hystax / optscale

FinOps, MLOps and cloud cost optimization tool. Supports AWS, Azure, GCP, Alibaba Cloud and Kubernetes.
https://hystax.com
Apache License 2.0
1.22k stars 169 forks source link

System unavailable after upgrading to the latest release (rollback does not solve the issue either) #198

Closed elikkatzgit closed 8 months ago

elikkatzgit commented 8 months ago

Performing an upgrade to the latest release using the below command caused system not to operate (.venv) ubuntu@ip-10-0-12-52:~/optscale/optscale-deploy$ ./runkube.py --with-elk --update-only -- optscale-poc 2024011701-public

Expected behavior

Upgrade logs

07:58:50.989: Pulling images for 10.0.12.52 07:58:50.993: Pulling image index.docker.io/hystax/arcee with tag 2024011701-public 07:58:51.869: Pulling image index.docker.io/hystax/auth with tag 2024011701-public 07:58:52.740: Pulling image index.docker.io/hystax/bi_exporter with tag 2024011701-public 07:58:53.601: Pulling image index.docker.io/hystax/bi_scheduler with tag 2024011701-public 07:58:54.452: Pulling image index.docker.io/hystax/booking_observer with tag 2024011701-public 07:58:55.350: Pulling image index.docker.io/hystax/bulldozer_api with tag 2024011701-public 07:58:56.203: Pulling image index.docker.io/hystax/bulldozer_worker with tag 2024011701-public 07:58:57.099: Pulling image index.docker.io/hystax/bumischeduler with tag 2024011701-public 07:58:57.954: Pulling image index.docker.io/hystax/bumiworker with tag 2024011701-public 07:58:58.800: Pulling image index.docker.io/hystax/calendar_observer with tag 2024011701-public 07:58:59.646: Pulling image index.docker.io/hystax/cleanelkdb with tag 2024011701-public 07:59:00.482: Pulling image index.docker.io/hystax/cleaninfluxdb with tag 2024011701-public 07:59:01.340: Pulling image index.docker.io/hystax/cleanmongodb with tag 2024011701-public 07:59:02.207: Pulling image index.docker.io/hystax/configurator with tag 2024011701-public 07:59:03.074: Pulling image index.docker.io/hystax/demo_org_cleanup with tag 2024011701-public 07:59:04.024: Pulling image index.docker.io/hystax/diproxy with tag 2024011701-public 07:59:04.878: Pulling image index.docker.io/hystax/diworker with tag 2024011701-public 07:59:05.749: Pulling image index.docker.io/hystax/elk with tag 2024011701-public 07:59:06.648: Pulling image index.docker.io/hystax/error_pages with tag 2024011701-public 07:59:07.499: Pulling image index.docker.io/hystax/etcd with tag 2024011701-public 07:59:08.405: Pulling image index.docker.io/hystax/failed_imports_dataset_generator with tag 2024011701-public 07:59:09.286: Pulling image index.docker.io/hystax/grafana with tag 2024011701-public 07:59:10.150: Pulling image index.docker.io/hystax/grafana_nginx with tag 2024011701-public 07:59:11.462: Pulling image index.docker.io/hystax/herald with tag 2024011701-public 07:59:12.310: Pulling image index.docker.io/hystax/herald_executor with tag 2024011701-public 07:59:13.162: Pulling image index.docker.io/hystax/influxdb with tag 2024011701-public 07:59:14.015: Pulling image index.docker.io/hystax/insider_api with tag 2024011701-public 07:59:14.881: Pulling image index.docker.io/hystax/insider_scheduler with tag 2024011701-public 07:59:15.755: Pulling image index.docker.io/hystax/insider_worker with tag 2024011701-public 07:59:16.633: Pulling image index.docker.io/hystax/jira_bus with tag 2024011701-public 07:59:17.487: Pulling image index.docker.io/hystax/jira_ui with tag 2024011701-public 07:59:18.351: Pulling image index.docker.io/hystax/katara_service with tag 2024011701-public 07:59:19.223: Pulling image index.docker.io/hystax/katara_worker with tag 2024011701-public 07:59:20.067: Pulling image index.docker.io/hystax/keeper with tag 2024011701-public 07:59:20.918: Pulling image index.docker.io/hystax/keeper_executor with tag 2024011701-public 07:59:21.790: Pulling image index.docker.io/hystax/live_demo_generator with tag 2024011701-public 07:59:22.644: Pulling image index.docker.io/hystax/mariadb with tag 2024011701-public 07:59:23.528: Pulling image index.docker.io/hystax/metroculus_api with tag 2024011701-public 07:59:24.421: Pulling image index.docker.io/hystax/metroculus_scheduler with tag 2024011701-public 07:59:25.299: Pulling image index.docker.io/hystax/metroculus_worker with tag 2024011701-public 07:59:26.192: Pulling image index.docker.io/hystax/mongo with tag 2024011701-public 07:59:27.058: Pulling image index.docker.io/hystax/ngui with tag 2024011701-public 07:59:27.978: Pulling image index.docker.io/hystax/ohsu with tag 2024011701-public 07:59:28.865: Pulling image index.docker.io/hystax/organization_violations with tag 2024011701-public 07:59:29.740: Pulling image index.docker.io/hystax/pharos_receiver with tag 2024011701-public 07:59:30.606: Pulling image index.docker.io/hystax/pharos_worker with tag 2024011701-public 07:59:31.472: Pulling image index.docker.io/hystax/redis with tag 2024011701-public 07:59:32.333: Pulling image index.docker.io/hystax/resource_discovery with tag 2024011701-public 07:59:33.196: Pulling image index.docker.io/hystax/resource_observer with tag 2024011701-public 07:59:34.064: Pulling image index.docker.io/hystax/resource_violations with tag 2024011701-public 07:59:34.922: Pulling image index.docker.io/hystax/rest_api with tag 2024011701-public 07:59:35.816: Pulling image index.docker.io/hystax/risp_scheduler with tag 2024011701-public 07:59:36.729: Pulling image index.docker.io/hystax/risp_worker with tag 2024011701-public 07:59:37.586: Pulling image index.docker.io/hystax/slacker with tag 2024011701-public 07:59:38.431: Pulling image index.docker.io/hystax/slacker_executor with tag 2024011701-public 07:59:39.265: Pulling image index.docker.io/hystax/trapper_scheduler with tag 2024011701-public 07:59:40.133: Pulling image index.docker.io/hystax/trapper_worker with tag 2024011701-public 07:59:40.988: Pulling image index.docker.io/hystax/users_dataset_generator with tag 2024011701-public 07:59:41.854: Pulling image index.docker.io/hystax/webhook_executor with tag 2024011701-public 07:59:42.725: Pulling image index.docker.io/hystax/gemini_scheduler with tag 2024011701-public 07:59:43.623: Pulling image index.docker.io/hystax/gemini_worker with tag 2024011701-public 07:59:44.507: Pulling image index.docker.io/hystax/power_schedule with tag 2024011701-public 07:59:45.414: images for tag: {'arcee': <Image: 'hystax/arcee:2024011701-public', 'arcee:local'>, 'auth': <Image: 'hystax/auth:2024011701-public', 'auth:local'>, 'bi_exporter': <Image: 'hystax/bi_exporter:2024011701-public', 'bi_exporter:local'>, 'bi_scheduler': <Image: 'hystax/bi_scheduler:2024011701-public', 'bi_scheduler:local'>, 'booking_observer': <Image: 'hystax/booking_observer:2024011701-public', 'booking_observer:local'>, 'bulldozer_api': <Image: 'hystax/bulldozer_api:2024011701-public', 'bulldozer_api:local'>, 'bulldozer_worker': <Image: 'hystax/bulldozer_worker:2024011701-public', 'bulldozer_worker:local'>, 'bumischeduler': <Image: 'hystax/bumischeduler:2024011701-public', 'bumischeduler:local'>, 'bumiworker': <Image: 'hystax/bumiworker:2024011701-public', 'bumiworker:local'>, 'calendar_observer': <Image: 'hystax/calendar_observer:2024011701-public', 'calendar_observer:local'>, 'cleanelkdb': <Image: 'hystax/cleanelkdb:2024011701-public', 'cleanelkdb:local'>, 'cleaninfluxdb': <Image: 'hystax/cleaninfluxdb:2024011701-public', 'cleaninfluxdb:local'>, 'cleanmongodb': <Image: 'hystax/cleanmongodb:2024011701-public', 'cleanmongodb:local'>, 'configurator': <Image: 'hystax/configurator:2024011701-public', 'configurator:local'>, 'demo_org_cleanup': <Image: 'hystax/demo_org_cleanup:2024011701-public', 'demo_org_cleanup:local'>, 'diproxy': <Image: 'hystax/diproxy:2024011701-public', 'diproxy:local'>, 'diworker': <Image: 'hystax/diworker:2024011701-public', 'diworker:local'>, 'elk': <Image: 'hystax/elk:2024011701-public', 'elk:local'>, 'error_pages': <Image: 'hystax/error_pages:2024011701-public', 'error_pages:local'>, 'etcd': <Image: 'hystax/etcd:2024011701-public', 'etcd:local', 'etcd:vlocal'>, 'failed_imports_dataset_generator': <Image: 'hystax/failed_imports_dataset_generator:2024011701-public', 'failed_imports_dataset_generator:local'>, 'grafana': <Image: 'hystax/grafana:2024011701-public', 'grafana:local'>, 'grafana_nginx': <Image: 'hystax/grafana_nginx:2024011701-public', 'grafana_nginx:local'>, 'herald': <Image: 'hystax/herald:2024011701-public', 'herald:local'>, 'herald_executor': <Image: 'hystax/herald_executor:2024011701-public', 'herald_executor:local'>, 'influxdb': <Image: 'hystax/influxdb:2024011701-public', 'influxdb:local'>, 'insider_api': <Image: 'hystax/insider_api:2024011701-public', 'insider_api:local'>, 'insider_scheduler': <Image: 'hystax/insider_scheduler:2024011701-public', 'insider_scheduler:local'>, 'insider_worker': <Image: 'hystax/insider_worker:2024011701-public', 'insider_worker:local'>, 'jira_bus': <Image: 'hystax/jira_bus:2024011701-public', 'jira_bus:local'>, 'jira_ui': <Image: 'hystax/jira_ui:2024011701-public', 'jira_ui:local'>, 'katara_service': <Image: 'hystax/katara_service:2024011701-public', 'katara_service:local'>, 'katara_worker': <Image: 'hystax/katara_worker:2024011701-public', 'katara_worker:local'>, 'keeper': <Image: 'hystax/keeper:2024011701-public', 'keeper:local'>, 'keeper_executor': <Image: 'hystax/keeper_executor:2024011701-public', 'keeper_executor:local'>, 'live_demo_generator': <Image: 'hystax/live_demo_generator:2024011701-public', 'live_demo_generator:local'>, 'mariadb': <Image: 'hystax/mariadb:2024011701-public', 'mariadb:local'>, 'metroculus_api': <Image: 'hystax/metroculus_api:2024011701-public', 'metroculus_api:local'>, 'metroculus_scheduler': <Image: 'hystax/metroculus_scheduler:2024011701-public', 'metroculus_scheduler:local'>, 'metroculus_worker': <Image: 'hystax/metroculus_worker:2024011701-public', 'metroculus_worker:local'>, 'mongo': <Image: 'hystax/mongo:2024011701-public', 'mongo:local'>, 'ngui': <Image: 'hystax/ngui:2024011701-public', 'ngui:local'>, 'ohsu': <Image: 'hystax/ohsu:2024011701-public', 'ohsu:local'>, 'organization_violations': <Image: 'hystax/organization_violations:2024011701-public', 'organization_violations:local'>, 'pharos_receiver': <Image: 'hystax/pharos_receiver:2024011701-public', 'pharos_receiver:local'>, 'pharos_worker': <Image: 'hystax/pharos_worker:2024011701-public', 'pharos_worker:local'>, 'redis': <Image: 'hystax/redis:2024011701-public', 'redis:local'>, 'resource_discovery': <Image: 'hystax/resource_discovery:2024011701-public', 'resource_discovery:local'>, 'resource_observer': <Image: 'hystax/resource_observer:2024011701-public', 'resource_observer:local'>, 'resource_violations': <Image: 'hystax/resource_violations:2024011701-public', 'resource_violations:local'>, 'rest_api': <Image: 'hystax/rest_api:2024011701-public', 'rest_api:local'>, 'risp_scheduler': <Image: 'hystax/risp_scheduler:2024011701-public', 'risp_scheduler:local'>, 'risp_worker': <Image: 'hystax/risp_worker:2024011701-public', 'risp_worker:local'>, 'slacker': <Image: 'hystax/slacker:2024011701-public', 'slacker:local'>, 'slacker_executor': <Image: 'hystax/slacker_executor:2024011701-public', 'slacker_executor:local'>, 'trapper_scheduler': <Image: 'hystax/trapper_scheduler:2024011701-public', 'trapper_scheduler:local'>, 'trapper_worker': <Image: 'hystax/trapper_worker:2024011701-public', 'trapper_worker:local'>, 'users_dataset_generator': <Image: 'hystax/users_dataset_generator:2024011701-public', 'users_dataset_generator:local'>, 'webhook_executor': <Image: 'hystax/webhook_executor:2024011701-public', 'webhook_executor:local'>, 'gemini_scheduler': <Image: 'hystax/gemini_scheduler:2024011701-public', 'gemini_scheduler:local'>, 'gemini_worker': <Image: 'hystax/gemini_worker:2024011701-public', 'gemini_worker:local'>, 'power_schedule': <Image: 'hystax/power_schedule:2024011701-public', 'power_schedule:local'>} 07:59:45.414: Tagging <Image: 'hystax/arcee:2024011701-public', 'arcee:local'> as arcee:local 07:59:45.419: Tagging <Image: 'hystax/auth:2024011701-public', 'auth:local'> as auth:local 07:59:45.424: Tagging <Image: 'hystax/bi_exporter:2024011701-public', 'bi_exporter:local'> as bi_exporter:local 07:59:45.429: Tagging <Image: 'hystax/bi_scheduler:2024011701-public', 'bi_scheduler:local'> as bi_scheduler:local 07:59:45.433: Tagging <Image: 'hystax/booking_observer:2024011701-public', 'booking_observer:local'> as booking_observer:local 07:59:45.438: Tagging <Image: 'hystax/bulldozer_api:2024011701-public', 'bulldozer_api:local'> as bulldozer_api:local 07:59:45.443: Tagging <Image: 'hystax/bulldozer_worker:2024011701-public', 'bulldozer_worker:local'> as bulldozer_worker:local 07:59:45.447: Tagging <Image: 'hystax/bumischeduler:2024011701-public', 'bumischeduler:local'> as bumischeduler:local 07:59:45.451: Tagging <Image: 'hystax/bumiworker:2024011701-public', 'bumiworker:local'> as bumiworker:local 07:59:45.456: Tagging <Image: 'hystax/calendar_observer:2024011701-public', 'calendar_observer:local'> as calendar_observer:local 07:59:45.460: Tagging <Image: 'hystax/cleanelkdb:2024011701-public', 'cleanelkdb:local'> as cleanelkdb:local 07:59:45.464: Tagging <Image: 'hystax/cleaninfluxdb:2024011701-public', 'cleaninfluxdb:local'> as cleaninfluxdb:local 07:59:45.468: Tagging <Image: 'hystax/cleanmongodb:2024011701-public', 'cleanmongodb:local'> as cleanmongodb:local 07:59:45.473: Tagging <Image: 'hystax/configurator:2024011701-public', 'configurator:local'> as configurator:local 07:59:45.477: Tagging <Image: 'hystax/demo_org_cleanup:2024011701-public', 'demo_org_cleanup:local'> as demo_org_cleanup:local 07:59:45.482: Tagging <Image: 'hystax/diproxy:2024011701-public', 'diproxy:local'> as diproxy:local 07:59:45.487: Tagging <Image: 'hystax/diworker:2024011701-public', 'diworker:local'> as diworker:local 07:59:45.491: Tagging <Image: 'hystax/elk:2024011701-public', 'elk:local'> as elk:local 07:59:45.496: Tagging <Image: 'hystax/error_pages:2024011701-public', 'error_pages:local'> as error_pages:local 07:59:45.504: Tagging <Image: 'hystax/etcd:2024011701-public', 'etcd:local', 'etcd:vlocal'> as etcd:local 07:59:45.508: Tagging <Image: 'hystax/failed_imports_dataset_generator:2024011701-public', 'failed_imports_dataset_generator:local'> as failed_imports_dataset_generator:local 07:59:45.512: Tagging <Image: 'hystax/grafana:2024011701-public', 'grafana:local'> as grafana:local 07:59:45.516: Tagging <Image: 'hystax/grafana_nginx:2024011701-public', 'grafana_nginx:local'> as grafana_nginx:local 07:59:45.521: Tagging <Image: 'hystax/herald:2024011701-public', 'herald:local'> as herald:local 07:59:45.526: Tagging <Image: 'hystax/herald_executor:2024011701-public', 'herald_executor:local'> as herald_executor:local 07:59:45.530: Tagging <Image: 'hystax/influxdb:2024011701-public', 'influxdb:local'> as influxdb:local 07:59:45.535: Tagging <Image: 'hystax/insider_api:2024011701-public', 'insider_api:local'> as insider_api:local 07:59:45.539: Tagging <Image: 'hystax/insider_scheduler:2024011701-public', 'insider_scheduler:local'> as insider_scheduler:local 07:59:45.543: Tagging <Image: 'hystax/insider_worker:2024011701-public', 'insider_worker:local'> as insider_worker:local 07:59:45.548: Tagging <Image: 'hystax/jira_bus:2024011701-public', 'jira_bus:local'> as jira_bus:local 07:59:45.553: Tagging <Image: 'hystax/jira_ui:2024011701-public', 'jira_ui:local'> as jira_ui:local 07:59:45.557: Tagging <Image: 'hystax/katara_service:2024011701-public', 'katara_service:local'> as katara_service:local 07:59:45.561: Tagging <Image: 'hystax/katara_worker:2024011701-public', 'katara_worker:local'> as katara_worker:local 07:59:45.566: Tagging <Image: 'hystax/keeper:2024011701-public', 'keeper:local'> as keeper:local 07:59:45.571: Tagging <Image: 'hystax/keeper_executor:2024011701-public', 'keeper_executor:local'> as keeper_executor:local 07:59:45.575: Tagging <Image: 'hystax/live_demo_generator:2024011701-public', 'live_demo_generator:local'> as live_demo_generator:local 07:59:45.580: Tagging <Image: 'hystax/mariadb:2024011701-public', 'mariadb:local'> as mariadb:local 07:59:45.585: Tagging <Image: 'hystax/metroculus_api:2024011701-public', 'metroculus_api:local'> as metroculus_api:local 07:59:45.589: Tagging <Image: 'hystax/metroculus_scheduler:2024011701-public', 'metroculus_scheduler:local'> as metroculus_scheduler:local 07:59:45.593: Tagging <Image: 'hystax/metroculus_worker:2024011701-public', 'metroculus_worker:local'> as metroculus_worker:local 07:59:45.598: Tagging <Image: 'hystax/mongo:2024011701-public', 'mongo:local'> as mongo:local 07:59:45.602: Tagging <Image: 'hystax/ngui:2024011701-public', 'ngui:local'> as ngui:local 07:59:45.606: Tagging <Image: 'hystax/ohsu:2024011701-public', 'ohsu:local'> as ohsu:local 07:59:45.611: Tagging <Image: 'hystax/organization_violations:2024011701-public', 'organization_violations:local'> as organization_violations:local 07:59:45.616: Tagging <Image: 'hystax/pharos_receiver:2024011701-public', 'pharos_receiver:local'> as pharos_receiver:local 07:59:45.620: Tagging <Image: 'hystax/pharos_worker:2024011701-public', 'pharos_worker:local'> as pharos_worker:local 07:59:45.624: Tagging <Image: 'hystax/redis:2024011701-public', 'redis:local'> as redis:local 07:59:45.629: Tagging <Image: 'hystax/resource_discovery:2024011701-public', 'resource_discovery:local'> as resource_discovery:local 07:59:45.633: Tagging <Image: 'hystax/resource_observer:2024011701-public', 'resource_observer:local'> as resource_observer:local 07:59:45.638: Tagging <Image: 'hystax/resource_violations:2024011701-public', 'resource_violations:local'> as resource_violations:local 07:59:45.643: Tagging <Image: 'hystax/rest_api:2024011701-public', 'rest_api:local'> as rest_api:local 07:59:45.647: Tagging <Image: 'hystax/risp_scheduler:2024011701-public', 'risp_scheduler:local'> as risp_scheduler:local 07:59:45.652: Tagging <Image: 'hystax/risp_worker:2024011701-public', 'risp_worker:local'> as risp_worker:local 07:59:45.657: Tagging <Image: 'hystax/slacker:2024011701-public', 'slacker:local'> as slacker:local 07:59:45.661: Tagging <Image: 'hystax/slacker_executor:2024011701-public', 'slacker_executor:local'> as slacker_executor:local 07:59:45.666: Tagging <Image: 'hystax/trapper_scheduler:2024011701-public', 'trapper_scheduler:local'> as trapper_scheduler:local 07:59:45.670: Tagging <Image: 'hystax/trapper_worker:2024011701-public', 'trapper_worker:local'> as trapper_worker:local 07:59:45.675: Tagging <Image: 'hystax/users_dataset_generator:2024011701-public', 'users_dataset_generator:local'> as users_dataset_generator:local 07:59:45.679: Tagging <Image: 'hystax/webhook_executor:2024011701-public', 'webhook_executor:local'> as webhook_executor:local 07:59:45.684: Tagging <Image: 'hystax/gemini_scheduler:2024011701-public', 'gemini_scheduler:local'> as gemini_scheduler:local 07:59:45.689: Tagging <Image: 'hystax/gemini_worker:2024011701-public', 'gemini_worker:local'> as gemini_worker:local 07:59:45.694: Tagging <Image: 'hystax/power_schedule:2024011701-public', 'power_schedule:local'> as power_schedule:local 07:59:45.698: Getting old overlay list Traceback (most recent call last): File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 254, in websocket_call client = WSClient(configuration, get_websocket_url(url), headers) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 77, in init self.sock.connect(url, header=header) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/websocket/_core.py", line 253, in connect self.handshake_response = handshake(self.sock, url, *addrs, **options) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/websocket/_handshake.py", line 57, in handshake status, resp = _get_resp_headers(sock) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/websocket/_handshake.py", line 150, in _get_resp_headers raise WebSocketBadStatusException("Handshake status {status} {message} -+-+- {headers} -+-+- {body}".format(status=status, message=status_message, headers=resp_headers, body=response_body), status, status_message, resp_headers, response_body) websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error -+-+- {'content-length': '28', 'content-type': 'text/plain; charset=utf-8', 'date': 'Thu, 25 Jan 2024 07:59:45 GMT'} -+-+- b'container not found ("etcd")'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./runkube.py", line 461, in acr.start(args.check, args.update_only) File "./runkube.py", line 344, in start old_overlay_list = self.get_old_overlay_list_for_update() File "./runkube.py", line 310, in get_old_overlay_list_for_update client = k8s_stream(self.kube_cl.connect_get_namespaced_pod_exec, File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 36, in stream return func(args, kwargs) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 835, in connect_get_namespaced_pod_exec (data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, kwargs) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 922, in connect_get_namespaced_pod_exec_with_http_info return self.api_client.call_api('/api/v1/namespaces/{namespace}/pods/{name}/exec', 'GET', File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 330, in call_api return self.__call_api(resource_path, method, File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 163, in __call_api response_data = self.request(method, url, File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 31, in _intercept_request_call return ws_client.websocket_call(config, args, **kwargs) File "/home/ubuntu/optscale/optscale-deploy/.venv/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 260, in websocket_call raise ApiException(status=0, reason=str(e)) kubernetes.client.rest.ApiException: (0) Reason: Handshake status 500 Internal Server Error -+-+- {'content-length': '28', 'content-type': 'text/plain; charset=utf-8', 'date': 'Thu, 25 Jan 2024 07:59:45 GMT'} -+-+- b'container not found ("etcd")'

Pods in error state: (.venv) ubuntu@ip-10-0-12-52:~/optscale/optscale-deploy$ kubectl get pods |grep Error bi-scheduler-1706101800-66zlg 0/1 Init:Error 0 17h booking-observer-scheduler-1706101680-pxvr6 0/1 Init:Error 0 17h calendar-observer-scheduler-1706104800-cjnrq 0/1 Init:Error 0 17h cleanmongodb-1706101740-pvvm2 0/1 Init:Error 0 17h gemini-scheduler-1706101800-p4pqc 0/1 Init:Error 0 17h live-demo-generator-scheduler-1706090400-xzfd9 0/1 Init:Error 0 17h metroculusscheduler-1706103000-cnrjn 0/1 Init:Error 0 17h organization-violations-scheduler-1706101800-r2z5k 0/1 Init:Error 0 17h power-schedule-scheduler-1706101800-6f8wf 0/1 Init:Error 0 17h pre-configurator-m6wd8 0/1 Init:Error 0 17h report-import-scheduler-0-1706102100-hp98z 0/1 Init:Error 0 17h report-import-scheduler-1-1706090400-57ksm 0/1 Init:Error 0 17h report-import-scheduler-6-1706097600-dhvvd 0/1 Init:Error 0 17h resource-discovery-scheduler-1706101800-gbcp6 0/1 Init:Error 0 17h resource-observer-scheduler-1706101800-l9ct8 0/1 Init:Error 0 17h resource-violations-scheduler-1706101800-gbpbm 0/1 Init:Error 0 17h risp-scheduler-1706103900-hgdkc 0/1 Init:Error 0 17h thanos-compactor-1706097600-96n9b 0/1 Init:Error 0 17h trapper-scheduler-1706103900-zqzmp 0/1 Init:Error 0 17h

elikkatzgit commented 8 months ago

I resolved the issue with the following steps:

git clone https://github.com/hystax/optscale.git git checkout 2024011701-public git reset --hard ./runkube.py --with-elk --update-only -- {NAME} 2024011701-public kubectl delete pods --all -n default