Closed nerdicbynature closed 3 months ago
The healthchecks for the OpenStack services come from the Kolla-Ansible project. It is necessary to create a list of which health checks are currently missing and which need to be added.
In my opinion all containers should have a working health check. Many already have, but not all of them.
Also, a very huge improvement would be to also consider the RabbitMQ connection state into the health as currently only logs show that there is a problem after one of the RabbitMQ have been restarted.
In my opinion all containers should have a working health check. Many already have, but not all of them.
Do you have a list from your environment? That would come in handy.
As for scs1 these are the containers on the controllers:
4317ed27d592 quay.io/osism/redis-sentinel:5.0.7.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) redis_sentinel
867515d7ac42 quay.io/osism/redis:5.0.7.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) redis
3be7fec27b42 quay.io/osism/ovn-controller:22.03.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days ovn_controller
f371798105ca quay.io/osism/openvswitch-vswitchd:2.17.3.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) openvswitch_vswitchd
d5ed087280f3 quay.io/osism/openvswitch-db-server:2.17.3.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) openvswitch_db
e0c7cefdc59c quay.io/osism/prometheus-cadvisor:0.38.7.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_cadvisor
a7059f741468 quay.io/osism/prometheus-memcached-exporter:0.6.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_memcached_exporter
be3ae6541f88 quay.io/osism/prometheus-haproxy-exporter:0.10.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_haproxy_exporter
44bf99f4d613 quay.io/osism/prometheus-mysqld-exporter:0.12.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_mysqld_exporter
bc1d827a3985 quay.io/osism/prometheus-node-exporter:0.18.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_node_exporter
1b3a9bc1b026 quay.io/osism/grafana:9.3.4.20230125 "dumb-init --single-…" 2 days ago Up 2 days grafana
c23dfac163e8 quay.io/osism/octavia-worker:10.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) octavia_worker
c839e66c91a9 quay.io/osism/octavia-housekeeping:10.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) octavia_housekeeping
bf483ca8e93f quay.io/osism/octavia-health-manager:10.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) octavia_health_manager
0a5b5bfdfe2a quay.io/osism/octavia-driver-agent:10.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days octavia_driver_agent
5eff2456adfd quay.io/osism/octavia-api:10.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) octavia_api
3d1017b382bd quay.io/osism/designate-sink:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_sink
b72f3d90f8fc quay.io/osism/designate-worker:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_worker
c141b3e8bf7e quay.io/osism/designate-mdns:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_mdns
c262371dd337 quay.io/osism/designate-producer:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_producer
c7bb2e96ffff quay.io/osism/designate-central:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_central
dbf7d0f5f190 quay.io/osism/designate-api:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_api
b3b534515901 quay.io/osism/designate-backend-bind9:14.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) designate_backend_bind9
ba9953cb9e1d quay.io/osism/nova-novncproxy:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_novncproxy
2b5ea5a70dd6 quay.io/osism/nova-conductor:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_conductor
a5444ef048cf quay.io/osism/nova-api:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_api
5af13b1dce99 quay.io/osism/nova-scheduler:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_scheduler
f5c12ee3ef3f quay.io/osism/barbican-worker:14.0.2.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) barbican_worker
a4d824326da9 quay.io/osism/barbican-keystone-listener:14.0.2.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) barbican_keystone_listener
018522b15721 quay.io/osism/barbican-api:14.0.2.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) barbican_api
6c0896060107 quay.io/osism/placement-api:7.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) placement_api
02cf27898c78 quay.io/osism/heat-engine:18.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) heat_engine
b9bbca17edba quay.io/osism/heat-api-cfn:18.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) heat_api_cfn
74b99fe0b3d0 quay.io/osism/heat-api:18.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) heat_api
cb111a87a8ac quay.io/osism/neutron-server:20.2.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) neutron_server
ac4818919ea9 quay.io/osism/ovn-northd:22.03.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days ovn_northd
f7c8aff9bf98 quay.io/osism/ovn-sb-db-server:22.03.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days ovn_sb_db
8941f7698ac4 quay.io/osism/ovn-nb-db-server:22.03.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days ovn_nb_db
6d3b8231cc7c quay.io/osism/cinder-backup:20.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) cinder_backup
e40197dd60a4 quay.io/osism/cinder-volume:20.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) cinder_volume
340fa7e1e863 quay.io/osism/cinder-scheduler:20.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) cinder_scheduler
c1353756d586 quay.io/osism/cinder-api:20.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) cinder_api
ce258e0443fa quay.io/osism/glance-api:24.1.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) glance_api
2d05a8a8b30e quay.io/osism/keystone:21.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) keystone
399e6cbe9ac4 quay.io/osism/keystone-fernet:21.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) keystone_fernet
5d034d58b0c0 quay.io/osism/keystone-ssh:21.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) keystone_ssh
1607370baf8e quay.io/osism/rabbitmq:3.10.14.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) rabbitmq
aee92ace3fc2 quay.io/osism/mariadb-clustercheck:10.6.11.20230125 "dumb-init --single-…" 2 days ago Up 2 days mariadb_clustercheck
19476c337374 quay.io/osism/mariadb-server:10.6.11.20230125 "dumb-init -- kolla_…" 2 days ago Up 2 days mariadb
50c1faaa2672 quay.io/osism/memcached:1.5.22.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) memcached
28e05b147632 quay.io/osism/keepalived:2.0.19.20230125 "dumb-init --single-…" 2 days ago Up 2 days keepalived
05d0a54a9088 quay.io/osism/haproxy:2.2.26.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) haproxy
317a841f1fdb quay.io/osism/cron:3.0pl1.20230125 "dumb-init --single-…" 3 days ago Up 3 days cron
6eaf9c3f55e0 quay.io/osism/kolla-toolbox:14.8.1.20230125 "dumb-init --single-…" 3 days ago Up 3 days kolla_toolbox
22b5b116bd3f quay.io/osism/fluentd:4.4.2.20230125 "dumb-init --single-…" 3 days ago Up 3 days fluentd
9b07242e40e8 quay.io/osism/ceph-daemon:pacific "/usr/bin/ceph-crash" 8 weeks ago Up 8 weeks ceph-crash-control1-scs1-az0
086e68c102d7 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 8 weeks ago Up 8 weeks ceph-mgr-control1-scs1-az0
36bee1de162f quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 8 weeks ago Up 8 weeks ceph-rgw-control1-scs1-az0-rgw0
74e2c81828c9 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 8 weeks ago Up 8 weeks ceph-mon-control1-scs1-az0
List from one compute node:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a6f388f4f7b6 quay.io/osism/prometheus-libvirt-exporter:4.2.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_libvirt_exporter
353c051bfd22 quay.io/osism/prometheus-cadvisor:0.38.7.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_cadvisor
cbac13fa3d26 quay.io/osism/prometheus-node-exporter:0.18.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days prometheus_node_exporter
4cde69fa591d quay.io/osism/nova-compute:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_compute
f338d93870f7 quay.io/osism/nova-libvirt:8.0.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_libvirt
5ad985c08aa5 quay.io/osism/nova-ssh:25.0.1.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) nova_ssh
cd7adc3620e5 quay.io/osism/neutron-metadata-agent:20.2.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) neutron_ovn_metadata_agent
92d30d879aff quay.io/osism/ovn-controller:22.03.0.20230125 "dumb-init --single-…" 2 days ago Up 2 days ovn_controller
de0fc1aad5b8 quay.io/osism/openvswitch-vswitchd:2.17.3.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) openvswitch_vswitchd
b3c45871158a quay.io/osism/openvswitch-db-server:2.17.3.20230125 "dumb-init --single-…" 2 days ago Up 2 days (healthy) openvswitch_db
97e9e3b5fde3 quay.io/osism/cron:3.0pl1.20230125 "dumb-init --single-…" 3 days ago Up 3 days cron
b450097ac211 quay.io/osism/kolla-toolbox:14.8.1.20230125 "dumb-init --single-…" 3 days ago Up 3 days kolla_toolbox
b3f59a1115a9 quay.io/osism/fluentd:4.4.2.20230125 "dumb-init --single-…" 3 days ago Up 3 days fluentd
15cd03aadd72 quay.io/osism/ceph-daemon:pacific "/usr/bin/ceph-crash" 2 months ago Up 2 months ceph-crash-compute1-scs1-az1
c6e8154dac6f quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-8
1bf2bea7ec66 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-4
d3aa3c33fd0a quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-25
7d2539fc27c5 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-20
1eadc5815311 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-15
e685666838c7 quay.io/osism/ceph-daemon:pacific "/opt/ceph-container…" 2 months ago Up 2 months ceph-osd-10
However, "healthy" is missleading in many cases. For example "nova-compute" is broken after a RabbitMQ restart, but still reports "healthy", because the health check only checks for an opened port but not whether the process is actually working.
To test this: restart all RabbitMQ instances and watch health check for nova* containers and see logs in /var/log/kolla/nova or try to schedule a VM on that hypervisor.
Currently the health check shows only basic information about the Python process:
neutron@neutron-api-b456cdbf8-2b7jn:/$ curl -X GET -i -H "Accept: application/json" http://localhost:8080/healthcheck ; echo
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 62
Date: Thu, 22 Feb 2024 10:17:23 GMT
{
"detailed": false,
"reasons": [
"OK"
]
}
Probably it is possible to add/extend this checks via a middleware plugin: https://opendev.org/openstack/oslo.middleware/src/branch/master/oslo_middleware/healthcheck
Closing this. There are health check now for most of the Kolla containers. We'll work on the improve of the health checks itself in the linked issued.
Hi,
any chance to add health check to all containers deployed by OSISM? Especially would it we very very helpful if the health status also includes the state of the connection to rabbitmq of the service that runs within the container.
Background: If 1 out of 3 controller nodes is rebootet also 1/3 of all RabitMQ container get restartet and a lot of services do not recover their broken connections to RabbitMQ. As a result many onnoticed problem are detected late.
Kind regards, André.