Open mullumaus opened 2 years ago
@mullumaus you're using LXD 3.0.x which is only supported for security fixes at this point. Can you upgrade to LXD 5.0.x so you are on a version that we actively provide bugfixes for?
(Note that you can't upgrade to LXD 5.0 directly from 3.0, you'll need to upgrade through 4.0 first)
It's possible that there's an issue with Dqlite in LXD 3.0 (the old Go implementation of dqlite) which combined with unclean termination of LXD on reboot could cause recent DB transactions to be lost. We've not seen other reports of this, but it's certainly possible.
You may want to look at your systemd log or console output to see if the machine appears to be hanging/waiting for LXD to exit, if it does, chances are things timeout, systemd kills it, potentially causing database issues.
LXD 4.0 has a completely different implementation of dqlite (in C) which uses a completely different way of storing things on disk. Combined with longer timeouts in the systemd units and reworked shutdown handling in LXD, we've never seen a report of the database somehow reverting itself there.
It could also not be a LXD issue at all and be Juju somehow reverting the profile, but there again, getting on a recent LXD will give you better tools to find that out as you'll get the lifecycle events which would then let you easily monitor all changes done to LXD including changes to profiles.
Required information
The output of
lxc-start --version
: 3.0.3lxc-checkconfig
: see attached fileuname -a
: 5.4.0-81-generic 91~18.04.1-Ubuntu SMP Fri Jul 23 13:36:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linuxcat /proc/self/cgroup
: 12:devices:/user.slice 11:memory:/user.slice 10:rdma:/ 9:cpuset:/ 8:perf_event:/ 7:freezer:/ 6:pids:/user.slice/user-1000.slice/session-100977.scope 5:cpu,cpuacct:/user.slice 4:net_cls,net_prio:/ 3:hugetlb:/ 2:blkio:/user.slice 1:name=systemd:/user.slice/user-1000.slice/session-100977.scope 0::/user.slice/user-1000.slice/session-100977.scopecat /proc/1/mounts
: see attached fileIssue description
Used juju to deploy ovn-chassis charm on lxd container, a lxc profile was created for ovn-chassis container
$ lxc profile show juju-openstack-octavia-ovn-chassis-14 config: linux.kernel_modules: openvswitch description: "" devices: {} name: juju-openstack-octavia-ovn-chassis-14 used_by: /1.0/containers/juju-a79b06-5-lxd-16
After the physical host was rebooted, the config in lxc profile was missing, container didn't load kernel module 'openvswitch'. $ lxc profile show juju-openstack-octavia-ovn-chassis-14 config: {} description: "" devices: {} name: juju-openstack-octavia-ovn-chassis-14 used_by: /1.0/containers/juju-a79b06-5-lxd-16
Steps to reproduce
We are unable to reproduce the issue every time although we have run into the issue more than once.
Information to attach
dmesg
)lxc-start -n <c> -l TRACE -o <logfile>
)