Closed cubeek closed 2 months ago
Hi @cubeek. Thanks for your PR.
I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
recheck
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f42cc8a5a4d44941b38fb11fddc95a46
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 2h 33m 01s :x: podified-multinode-edpm-deployment-crc RETRY_LIMIT in 17m 29s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 2h 15m 49s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 7m 13s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 6m 49s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 09s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 12m 14s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 10m 02s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 11m 24s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 7m 47s :x: edpm-ansible-molecule-edpm_iscsid FAILURE in 4m 41s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 7m 55s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 11m 46s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 3m 41s :x: adoption-standalone-to-crc-ceph-provider FAILURE in 1h 42m 59s
/ok-to-test
recheck
/lgtm
recheck
recheck after dequeuing stuck zuul jobs
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a718d628679c437487b48302eca7c7a4
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 3h 27m 27s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 42m 40s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 51m 00s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 18s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 6m 10s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 13s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 11m 20s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 9m 39s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 10m 53s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 7m 08s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 5m 06s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 7m 39s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 12m 11s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 4m 22s :x: adoption-standalone-to-crc-ceph-provider FAILURE in 3h 12m 02s
The job failed with
- lastTransitionTime: "2024-08-16T10:07:37Z"
message: 'DataPlaneNodeSet error occurred spec.tlsEnabled: Forbidden: TLS settings
on Data Plane node set and Control Plane openstack do not match, Node set: false
Control Plane Ingress: true Control Plane PodLevel: false'
reason: Error
severity: Error
status: "False"
type: SetupReady
This doesn't strike me as related to this PR.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: cubeek, slawqo
The full list of commands accepted by this bot can be found here.
The pull request process is described here
[WARNING]: ERROR: Can't pull image quay.io/podified-antelope-centos9/openstack-
ceilometer-compute:current-podified stderr: Trying to pull quay.io/podified-
antelope-centos9/openstack-ceilometer-compute:current-podified... Getting image
source signatures Copying blob
sha256:c5c3accfc3e7c941c61b471205eefeb8067ec0ee44f7c399be64cd698b8e60e0 Copying
blob sha256:c90e33e6a3a48417a06fbf7cae27a9c755e64d19ca2ebd207141a589a8d411b0
Copying blob
sha256:f14dad47a5d0864a97205e0b14878d920d04593a065a9ef212473bea3fb4cef1 Copying
blob sha256:c143e920b7d46a7c824661f29329eca9a37bffc12ea1cfcad2aab5538545eeed
Copying blob
sha256:e8b54c863393b7a8216a71737771b73b16a8e43ec7acf927c7faa175c255e551 Error:
reading blob
sha256:f14dad47a5d0864a97205e0b14878d920d04593a065a9ef212473bea3fb4cef1:
fetching blob: received unexpected HTTP status: 502 Bad Gateway
recheck
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c3ac5313e63a4551995fbad69664917e
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 4h 46m 01s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 43m 17s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 58m 54s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 03s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 5m 11s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 4m 26s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 8m 13s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 7m 44s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 8m 33s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 6m 05s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 4m 14s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 6m 34s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 11m 45s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 3m 49s :x: adoption-standalone-to-crc-ceph-provider FAILURE in 2h 48m 06s
recheck
Needs to be rebased on main to get rid off edpm_tuned failures
Also no logs collected after the move https://logserver.rdoproject.org/26/726/aef38dab485fc18092f303d9a76c4b99f0b7af2a/github-check/cifmw-crc-podified-edpm-baremetal/88a8a02/ci-framework-data/logs/192.168.122.100/service_logs/ , so that also should be identified/fixed before merging this.
Needs to be rebased on main to get rid off edpm_tuned failures
Also no logs collected after the move https://logserver.rdoproject.org/26/726/aef38dab485fc18092f303d9a76c4b99f0b7af2a/github-check/cifmw-crc-podified-edpm-baremetal/88a8a02/ci-framework-data/logs/192.168.122.100/service_logs/ , so that also should be identified/fixed before merging this.
@SeanMooney @jpodivin may be you know what causing this(logs not collected with journalctl -u
which logs are missing
the service_logs directory is some additional custom logging i added for the ci jobs in the ci framework to grab the logs for systemd services related to openstack
systemctl list-units | awk '/virt|edpm/ {print $1}' | grep -v sys | xargs -I {} sudo bash -c 'journalctl -u {} > /tmp/{{ host_ip }}/service_logs/{}.log'
so unless the neutron systemd service units are not correctly prefixed with edpm
this should work for neutron
since ovs is host installed like libvirt you will need to add ovs to the regex like i did for libvirt.
this is not how the logs will be gather for customer however there the logs will be gather via must_gather and sos reports
the ovn logs do have some output
but its mostly empty
we do see ovn_controller
log messages
the ci job is not configured to unpack the sosreports but normally you would find those logs in https://logserver.rdoproject.org/26/726/aef38dab485fc18092f303d9a76c4b99f0b7af2a/github-check/cifmw-crc-podified-edpm-baremetal/88a8a02/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/sos-reports/_all_nodes/sosreport-edpm-compute-0.tar.xz
ci-framework-data/logs/192.168.122.100/log/ is a secondary way to look at that that predates must gather.
my guess is ovn is currently configured to log directly to the journal instead of standard out and as a result it is not logging with the edpm
prefix and not being gathered in service_logs
i do not see ovs logs in var/log/messages but it looks like its still configured to log to a file based on the first line
2024-08-20T15:36:39.179Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
which logs are missing
Sorry i must have been explicit with log links, but i was looking mainly the empty logs in edpm_ovn_controller.service.log, edpm_ovn_metadata_agent.service.log etc which you spotted below.
the service_logs directory is some additional custom logging i added for the ci jobs in the ci framework to grab the logs for systemd services related to openstack
systemctl list-units | awk '/virt|edpm/ {print $1}' | grep -v sys | xargs -I {} sudo bash -c 'journalctl -u {} > /tmp/{{ host_ip }}/service_logs/{}.log'
Isn't that also the expected way to look for logs when checking live on system, i do normally do journalctl -u edpm_nova_compute and expected same would work for neutron agents as well with this patch. journalctl -u edpm_ovn_controller returns just service startup enteries with or without this patch.
so unless the neutron systemd service units are not correctly prefixed with
edpm
this should work for neutronsystemd services are prefixed with edpm_ already and that's why we see those collected in service_logs but issue is those are mostly empty. since ovs is host installed like libvirt you will need to add ovs to the regex like i did for libvirt.
Ok thx that could be checked separately. This patch is not touching any ovs bits. this is not how the logs will be gather for customer however there the logs will be gather via must_gather and sos reports
the ovn logs do have some output
but its mostly empty
Yes trying to understand this part why the logs not visible here, what additional changes required for this?
we do see
ovn_controller
log messagesthe ci job is not configured to unpack the sosreports but normally you would find those logs in https://logserver.rdoproject.org/26/726/aef38dab485fc18092f303d9a76c4b99f0b7af2a/github-check/cifmw-crc-podified-edpm-baremetal/88a8a02/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/sos-reports/_all_nodes/sosreport-edpm-compute-0.tar.xz
ci-framework-data/logs/192.168.122.100/log/ is a secondary way to look at that that predates must gather.
my guess is ovn is currently configured to log directly to the journal instead of standard out and as a result it is not logging with the
edpm
prefix and not being gathered in service_logsNeed to see what need to be changed to log it properly. Any idea? i do not see ovs logs in var/log/messages but it looks like its still configured to log to a file based on the first line
2024-08-20T15:36:39.179Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
Yes right, @cubeek you planned to handle ovs bits in separate patch?
Okk i was able to reproduce it locally with ansibleee image from this PR quay.rdoproject.org/openstack-k8s-operators/openstack-ansibleee-runner:734a871f2663ecd078f917212b17c5e84b16b191
What i observed is after restarting the service logs are visible with journalctl -u.
` [zuul@networker-0 ~]$ sudo journalctl -u edpm_ovn_metadata_agent Aug 27 07:23:40 networker-0 systemd[1]: Starting ovn_metadata_agent container... Aug 27 07:23:40 networker-0 edpm-start-podman-container[20550]: Creating additional drop-in dependency for "ovn_metadata_agent" (2662e5> Aug 27 07:23:41 networker-0 systemd[1]: Started ovn_metadata_agent container.
[zuul@networker-0 ~]$ ps -eaf|grep metadata root 19938 1 0 07:23 ? 00:00:00 /usr/bin/conmon --api-version 1 -c 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 -u 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata -p /run/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata/pidfile -n ovn_metadata_agent --exit-dir /run/libpod/exits --full-attach -s -l journald --log-level info --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata/oci-log --conmon-pidfile /run/ovn_metadata_agent.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg info --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 root 19942 19940 0 07:23 ? 00:00:04 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) root 20417 19942 0 07:23 ? 00:00:00 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) root 20418 19942 0 07:23 ? 00:00:00 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) zuul 21137 20841 0 07:51 pts/0 00:00:00 grep --color=auto metadata [zuul@networker-0 ~]$ [zuul@networker-0 ~]$ sudo systemctl restart edpm_ovn_metadata_agent
[zuul@networker-0 ~]$ ps -eaf|grep metadata root 21199 1 0 07:51 ? 00:00:00 /usr/bin/conmon --api-version 1 -c 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 -u 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata -p /run/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata/pidfile -n ovn_metadata_agent --exit-dir /run/libpod/exits --full-attach -s -l journald --log-level warning --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4/userdata/oci-log --conmon-pidfile /run/ovn_metadata_agent.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2662e5467da4310d27edb246057a84bca83216019ce35b1353abb44ea59267a4 root 21203 21201 31 07:51 ? 00:00:03 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) root 21261 21203 0 07:51 ? 00:00:00 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) root 21262 21203 0 07:51 ? 00:00:00 neutron-ovn-metadata-agent (/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent) zuul 21271 20841 0 07:51 pts/0 00:00:00 grep --color=auto metadata [zuul@networker-0 ~]$ sudo journalctl -u edpm_ovn_metadata_agent Aug 27 07:23:40 networker-0 systemd[1]: Starting ovn_metadata_agent container... Aug 27 07:23:40 networker-0 edpm-start-podman-container[20550]: Creating additional drop-in dependency for "ovn_metadata_agent" (2662e5> Aug 27 07:23:41 networker-0 systemd[1]: Started ovn_metadata_agent container. Aug 27 07:51:43 networker-0 systemd[1]: Stopping ovn_metadata_agent container... Aug 27 07:51:44 networker-0 podman[21142]: 2024-08-27 07:51:44.142577336 +0000 UTC m=+0.873332196 container died 2662e5467da4310d27edb2> Aug 27 07:51:44 networker-0 podman[21142]: 2024-08-27 07:51:44.18973634 +0000 UTC m=+0.920491170 container cleanup 2662e5467da4310d27ed> Aug 27 07:51:44 networker-0 podman[21142]: ovn_metadata_agent Aug 27 07:51:44 networker-0 podman[21170]: ovn_metadata_agent Aug 27 07:51:44 networker-0 systemd[1]: edpm_ovn_metadata_agent.service: Deactivated successfully. Aug 27 07:51:44 networker-0 systemd[1]: Stopped ovn_metadata_agent container. Aug 27 07:51:44 networker-0 systemd[1]: Starting ovn_metadata_agent container... Aug 27 07:51:44 networker-0 podman[21184]: 2024-08-27 07:51:44.39492599 +0000 UTC m=+0.089644466 container init 2662e5467da4310d27edb24> Aug 27 07:51:44 networker-0 podman[21184]: 2024-08-27 07:51:44.401012455 +0000 UTC m=+0.095730914 container start 2662e5467da4310d27edb> Aug 27 07:51:44 networker-0 edpm-start-podman-container[21184]: ovn_metadata_agent Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + sudo -E kolla_set_configs Aug 27 07:51:44 networker-0 edpm-start-podman-container[21183]: Creating additional drop-in dependency for "ovn_metadata_agent" (2662e5> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Loading config file at /var/lib/kolla/config_files/config.json Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Validating config file Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Kolla config strategy set to: COPY_ALWAYS Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Copying service configuration files Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Deleting /etc/neutron/rootwrap.conf Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Copying /etc/neutron.conf.d/01-rootwrap.conf to /etc/neutron/rootw> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /etc/neutron/rootwrap.conf Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Writing out command to execute Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/kill_scripts Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/ovn-metadata-proxy Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/external Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/ovn_metadata_haproxy_wrapp> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/metadata_proxy Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/kill_scripts/haproxy-kill Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/external/pids Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: ++ cat /run_command Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + CMD=neutron-ovn-metadata-agent [zuul@networker-0 ~]$ sudo journalctl -u edpm_ovn_metadata_agent Aug 27 07:23:40 networker-0 systemd[1]: Starting ovn_metadata_agent container... Aug 27 07:23:40 networker-0 edpm-start-podman-container[20550]: Creating additional drop-in dependency for "ovn_metadata_agent" (2662e5> Aug 27 07:23:41 networker-0 systemd[1]: Started ovn_metadata_agent container. Aug 27 07:51:43 networker-0 systemd[1]: Stopping ovn_metadata_agent container... Aug 27 07:51:44 networker-0 podman[21142]: 2024-08-27 07:51:44.142577336 +0000 UTC m=+0.873332196 container died 2662e5467da4310d27edb2> Aug 27 07:51:44 networker-0 podman[21142]: 2024-08-27 07:51:44.18973634 +0000 UTC m=+0.920491170 container cleanup 2662e5467da4310d27ed> Aug 27 07:51:44 networker-0 podman[21142]: ovn_metadata_agent Aug 27 07:51:44 networker-0 podman[21170]: ovn_metadata_agent Aug 27 07:51:44 networker-0 systemd[1]: edpm_ovn_metadata_agent.service: Deactivated successfully. Aug 27 07:51:44 networker-0 systemd[1]: Stopped ovn_metadata_agent container. Aug 27 07:51:44 networker-0 systemd[1]: Starting ovn_metadata_agent container... Aug 27 07:51:44 networker-0 podman[21184]: 2024-08-27 07:51:44.39492599 +0000 UTC m=+0.089644466 container init 2662e5467da4310d27edb24> Aug 27 07:51:44 networker-0 podman[21184]: 2024-08-27 07:51:44.401012455 +0000 UTC m=+0.095730914 container start 2662e5467da4310d27edb> Aug 27 07:51:44 networker-0 edpm-start-podman-container[21184]: ovn_metadata_agent Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + sudo -E kolla_set_configs Aug 27 07:51:44 networker-0 edpm-start-podman-container[21183]: Creating additional drop-in dependency for "ovn_metadata_agent" (2662e5> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Loading config file at /var/lib/kolla/config_files/config.json Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Validating config file Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Kolla config strategy set to: COPY_ALWAYS Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Copying service configuration files Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Deleting /etc/neutron/rootwrap.conf Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Copying /etc/neutron.conf.d/01-rootwrap.conf to /etc/neutron/rootw> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /etc/neutron/rootwrap.conf Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Writing out command to execute Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/kill_scripts Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/ovn-metadata-proxy Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/external Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/ovn_metadata_haproxy_wrapp> Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/metadata_proxy Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/kill_scripts/haproxy-kill Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: INFO:main:Setting permission for /var/lib/neutron/external/pids Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: ++ cat /run_command Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + CMD=neutron-ovn-metadata-agent Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + ARGS= Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + sudo kolla_copy_cacerts Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + [[ ! -n '' ]] Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + . kolla_extend_start Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: Running command: 'neutron-ovn-metadata-agent' Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + echo 'Running command: '\''neutron-ovn-metadata-agent'\''' Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + umask 0022 Aug 27 07:51:44 networker-0 ovn_metadata_agent[21199]: + exec neutron-ovn-metadata-agent Aug 27 07:51:44 networker-0 systemd[1]: Started ovn_metadata_agent container. Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.425 21203 INFO neutron.common.config [-] Logging enabled! Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.425 21203 INFO neutron.common.config [-] /usr/bin/neutron-ov> Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.425 21203 DEBUG neutron.common.config [-] command line: /usr> Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.426 21203 DEBUG neutron.agent.ovn.metadata_agent [-] ***> Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.426 21203 DEBUG neutron.agent.ovn.metadata_agent [-] Configu> Aug 27 07:51:47 networker-0 ovn_metadata_agent[21199]: 2024-08-27 07:51:47.426 21203 DEBUG neutron.agent.ovn.metadata_agent [-] command> `
Spotted a diff after restart of the service for the conmon process:-
-log-level info --exit-command-arg info
vs
-log-level warning --exit-command-arg warning
nova_compute is also impacted but the issue didn't got visible as it got restarted during the deployment itself with handlers[1]. Other services running on edpm nodes might also be impacted if not restarted during deployment, like i see similar behavior for edpm_iscsid/edpm_logrorate_chrond. Still don't know why the behavior is seen with first start and vanishing after service restart. Locally for logrotate_chrond it's not restarted but logging correctly[3], in CI logs[2] it's not so looks some kind of race.
[1] https://github.com/search?q=repo%3Aopenstack-k8s-operators%2Fedpm-ansible%20%22Restart%20Nova%22&type=code
[2] https://logserver.rdoproject.org/26/726/aef38dab485fc18092f303d9a76c4b99f0b7af2a/github-check/cifmw-crc-podified-edpm-baremetal/88a8a02/ci-framework-data/logs/192.168.122.100/service_logs/edpm_logrotate_crond.service.log
[3]
[zuul@networker-0 ~]$ sudo journalctl -u edpm_logrotate_crond Aug 27 07:20:12 networker-0 systemd[1]: Starting logrotate_crond container... Aug 27 07:20:12 networker-0 podman[1532]: 2024-08-27 07:20:12.818086548 +0000 UTC m=+0.185700262 system refresh Aug 27 07:20:12 networker-0 podman[1532]: 2024-08-27 07:20:12.916732967 +0000 UTC m=+0.284346635 container init 708b0af590a3ff9c5dea0a0> Aug 27 07:20:12 networker-0 podman[1532]: 2024-08-27 07:20:12.923712705 +0000 UTC m=+0.291326367 container start 708b0af590a3ff9c5dea0a> Aug 27 07:20:12 networker-0 podman[1532]: logrotate_crond Aug 27 07:20:12 networker-0 systemd[1]: Started logrotate_crond container. Aug 27 07:20:12 networker-0 logrotate_crond[1650]: + sudo -E kolla_set_configs Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Validating config file Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Kolla config strategy set to: COPY_ALWAYS Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Copying service configuration files Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Deleting /etc/logrotate-crond.conf Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/logrotate-crond.conf to /e> Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Deleting /var/spool/cron/root Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Copying /var/lib/kolla/config_files/src/var/spool/cron/root to /var/sp> Aug 27 07:20:13 networker-0 logrotate_crond[1650]: INFO:__main__:Writing out command to execute Aug 27 07:20:13 networker-0 logrotate_crond[1650]: ++ cat /run_command Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + CMD='/usr/sbin/crond -s -n' Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + ARGS= Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + sudo kolla_copy_cacerts Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + [[ ! -n '' ]] Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + . kolla_extend_start Aug 27 07:20:13 networker-0 logrotate_crond[1650]: Running command: '/usr/sbin/crond -s -n' Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + echo 'Running command: '\''/usr/sbin/crond -s -n'\''' Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + umask 0022 Aug 27 07:20:13 networker-0 logrotate_crond[1650]: + exec /usr/sbin/crond -s -n
glad you found the issue, yes i intentionally used handeler to restart the nova compute container anytime there is a config change or definition change for the contienr to ensure we are always running with the correct content.
journalctl -u is the expected way to view live logs (i normally do journalctl -f -n 1000 -u
but it makes sense that if the container was not restarted then the change did not take effect yet.
To my understanding the deploying tool doesn't respect the settings, is that correct? Does it affect all services, not just Neutron? I will report a bug for it
Reported https://issues.redhat.com/browse/OSPRH-9723 for the starter
glad you found the issue, yes i intentionally used handeler to restart the nova compute container anytime there is a config change or definition change for the contienr to ensure we are always running with the correct content.
@SeanMooney ok so that restart here is basically hiding this behavior for nova_compute. As with fresh deployment nova_compute is started with up to date config and immediately it also restarts which shouldn't be required with fresh setup. Also observed nova_compute is restarted on rerunning the deployment without any change due to these current handlers but that's separate issue and can be checked seperately. journalctl -u is the expected way to view live logs (i normally do journalctl -f -n 1000 -u )
Ok so this interface will not work for network agents with current state of this PR unless and until we have some fix or workaround(podman stop or restart or something else) applied. Is it ok to get this merged without any workaround applied or add a workaround similar to nova and later cleanup as part of the issue reported by Jakub? Me not sure if there is something needs to be done on systemd or podmon or conmon side to get this behavior resolved but it makes sense that if the container was not restarted then the change did not take effect yet.
Config wise everything is in place it's happening due to how we setting things up.
So ovn_controller and metadata agents correctly logging to stdout and journal, as correct fields not set these will not be visible with journalctl -u
the handeler shoudl only run if specific tasks were marked as changed.
on first run it will restart them because we wrote the the configs but i don't want to remove that restart because 1 its fast and two the hanelers approach is more consistent in general.
i previously proposed adopting that pattern for all services. I'm not sure if other took that advice or not.
the handeler shoudl only run if specific tasks were marked as changed.
on first run it will restart them because we wrote the the configs but i don't want to remove that restart because 1 its fast and two the hanelers approach is more consistent in general.
i previously proposed adopting that pattern for all services. I'm not sure if other took that advice or not.
This is probably not the best place to discuss this but since we're talking about it, what is the reason again for restarting services when we deploy them? It doesn't sound optimal as we already know the config settings prior to creating the containers running the services, so why restarting services? Is it because some config changes are figured out after the service was started?
the handeler shoudl only run if specific tasks were marked as changed. on first run it will restart them because we wrote the the configs but i don't want to remove that restart because 1 its fast and two the hanelers approach is more consistent in general. i previously proposed adopting that pattern for all services. I'm not sure if other took that advice or not.
This is probably not the best place to discuss this but since we're talking about it, what is the reason again for restarting services when we deploy them? It doesn't sound optimal as we already know the config settings prior to creating the containers running the services, so why restarting services? Is it because some config changes are figured out after the service was started?
we only restart the service if the config or container definition changes. we do not want to have a different code patch for first deploy and any subsequent execution of the role to reduce bugs like the one found for neutorn/ovn containers. as a result its a known expectation that for greenfiled we will have 1 additional restart because on the very first deployment, we update the config by creating it for the first time
this is a very specific edge case that i intentionally did not want to optimise for as it pessimies every other deployment after the very first one.
the handeler shoudl only run if specific tasks were marked as changed. on first run it will restart them because we wrote the the configs but i don't want to remove that restart because 1 its fast and two the hanelers approach is more consistent in general. i previously proposed adopting that pattern for all services. I'm not sure if other took that advice or not.
This is probably not the best place to discuss this but since we're talking about it, what is the reason again for restarting services when we deploy them? It doesn't sound optimal as we already know the config settings prior to creating the containers running the services, so why restarting services? Is it because some config changes are figured out after the service was started?
we only restart the service if the config or container definition changes. we do not want to have a different code patch for first deploy and any subsequent execution of the role to reduce bugs like the one found for neutorn/ovn containers. as a result its a known expectation that for greenfiled we will have 1 additional restart because on the very first deployment, we update the config by creating it for the first time
this is a very specific edge case that i intentionally did not want to optimise for as it pessimies every other deployment after the very first one.
perhaps a little more context is helpful
we only restart if the tasks that are explicitly marked with
notify:
- Restart nova
actually have the changed status.
so if the content of the config is unchanged it will return OK an no restart is done.
it would be suboptimal to have to complicate the code to cater for the first install call to avoid a trivial systems service restart
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/300d32c39a464164ab3f32ec3c3f173b
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 42m 29s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 15m 29s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 12s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 7m 08s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 5m 53s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 11s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 10m 08s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 10m 09s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 10m 20s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 6m 56s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 4m 44s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 7m 48s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 12m 23s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 4m 00s :heavy_check_mark: edpm-ansible-molecule-edpm_tuned SUCCESS in 6m 08s :x: adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 22m 35s
recheck
/lgtm
/cherry-pick 18.0.0-proposed
@karelyatin: new pull request created: #742
glad you found the issue, yes i intentionally used handeler to restart the nova compute container anytime there is a config change or definition change for the contienr to ensure we are always running with the correct content.
@SeanMooney ok so that restart here is basically hiding this behavior for nova_compute. As with fresh deployment nova_compute is started with up to date config and immediately it also restarts which shouldn't be required with fresh setup. Also observed nova_compute is restarted on rerunning the deployment without any change due to these current handlers but that's separate issue and can be checked seperately.
journalctl -u is the expected way to view live logs (i normally do journalctl -f -n 1000 -u )
Ok so this interface will not work for network agents with current state of this PR unless and until we have some fix or workaround(podman stop or restart or something else) applied. Is it ok to get this merged without any workaround applied or add a workaround similar to nova and later cleanup as part of the issue reported by Jakub? Me not sure if there is something needs to be done on systemd or podmon or conmon side to get this behavior resolved
but it makes sense that if the container was not restarted then the change did not take effect yet.
Config wise everything is in place it's happening due to how we setting things up.
* ``` Create/Start container with podman run -n <container name> ``` * ``` Create systemd unit with ExecStart=podman start <container name> and enable/start it ``` * ``` Since container is already started podman start is a noop ``` * ``` With podman/conmon restart or systemd restart logs are visible(as fields like _SYSTEMD_SLICE, _SYSTEMD_CGROUP, _SYSTEMD_UNIT etc are updated properly), seems that retriggers process recreate(conmon and that takes care of those logs link to systemd unit, iiuc done as part of https://github.com/containers/conmon/blob/3bc422cd8aaec542d85d1a80f2d38e6e69046b5b/src/ctr_logging.c#L254) ```
To fix this particular part(to not start container with podman run and let systemd start it) @cubeek pushed https://github.com/openstack-k8s-operators/edpm-ansible/pull/743
Logs too collected fine now:- https://logserver.rdoproject.org/43/743/80e26ca1535bd248dfa5ddcc5d603dca62c86b0c/github-check/podified-multinode-edpm-deployment-crc/03bfe31/controller/ci-framework-data/logs/38.102.83.224/service_logs/edpm_ovn_controller.service.log https://logserver.rdoproject.org/43/743/80e26ca1535bd248dfa5ddcc5d603dca62c86b0c/github-check/cifmw-crc-podified-edpm-baremetal/db6be93/ci-framework-data/logs/192.168.122.100/service_logs/edpm_ovn_controller.service.log
So ovn_controller and metadata agents correctly logging to stdout and journal, as correct fields not set these will not be visible with journalctl -u or systemctl status to view these. Also /me don't know how centralized logging Epic will actually collects the logs from edpm nodes, if it filters based on systemd unit name these logs will get missed.
Neutron services were changed only on OCP side as part of [1] but it was forgotten to change EDPM side too, tracked by [2]. This patch series removes logging to files and relies only on journald.
[1] https://issues.redhat.com/browse/OSPRH-154 [2] https://issues.redhat.com/browse/OSPRH-9248