Nodes become unhealthy after from 4.11 to 4.12

The previous cluster version was 4.11.0-0.okd-2022-12-02-145640. After upgrade the cluster to each version of 4.12, The Nodes become unhealthy. When we check the nodes, we find out the EC2 is not healthy too. When we check EC2 and its services, we faced error in networkmanager that doesn't assign IP to the instance and also, kubelet service is not running. Finally the error shows issue is relatted to ovsdb-server. The user and group "openvswitch:hugetlbfs" is not exist on the instance and it cause failing the ovsdb-server and openvswitch. When we create mentioned user and group, the problem is solved. The question is: Why upgrading to 4.12 version causes this problem? The cluster doesn't have this issue when upgrade patches in 4.11 version.

ovsdb-server log

Jul 01 08:01:48 localhost.localdomain sh[1726]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Jul 01 08:01:48 localhost.localdomain sh[1731]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Jul 01 08:01:48 localhost.localdomain sh[1732]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1763]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1764]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1766]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1768]: setpriv: failed to parse reuid: ''
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1770]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1771]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1773]: id: 'openvswitch': no such user
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1775]: setpriv: failed to parse reuid: ''
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1776]: install: invalid user 'openvswitch'
Jul 01 08:01:48 localhost.localdomain ovsdb-server[1778]: ovs|00001|daemon_unix|EMER|(null): user openvswitch not found, abort>
Jul 01 08:01:48 localhost.localdomain ovs-ctl[1778]: ovsdb-server: (null): user openvswitch not found, aborting.

Cluster upgrade history

Version

from: 4.11.0-0.okd-2022-12-02-145640 to: 4.12.0-0.okd-2023-03-18-084815

How to reproduce

oc adm upgrade --to="4.12.0-0.okd-2023-03-18-084815"

okd-project / okd

Nodes become unhealthy after from 4.11 to 4.12 #1961