Closed jalbstmeijer closed 1 year ago
Hi!
Today we found some instances to have all their container without /proc mounted.
By "instances" you mean host nodes? Because we have an instance term in LXD https://linuxcontainers.org/lxd/docs/latest/explanation/instances/.
Couldn't you show snap list
output from these nodes?
Hi,
By "instances" you mean host nodes? Because we have an instance term in LXD https://linuxcontainers.org/lxd/docs/latest/explanation/instances/.
You are right, host nodes.
Couldn't you show
snap list
output from these nodes?
[root@xxx ~]# snap list
Name Version Rev Tracking Publisher Notes
core20 20230207 1828 latest/stable canonical✓ base
core22 20230210 522 latest/stable canonical✓ base
lxd 5.11-5044355 24576 latest/candidate canonical✓ -
snapd 2.58.2 18357 latest/stable canonical✓ snapd
So, you are using latest/candidate
channel for LXD snap. Couldn't you check which channel is used on the nodes where problem is not reproducible? My suspicious is that there was some temporary issue with LXD snap build.
I'm using latest/candidate on all, but good point maybe I should not. I will switch back to latest/stable. So host nodes which don't show the issue have exact the same current versions. Example host node without issues;
Mar 13 22:16:55 xx lxd.daemon: => Stop reason is: snap refresh
Mar 13 22:16:55 xx lxd.daemon: => Stopping LXD
Mar 13 22:16:56 xx lxd.daemon: => LXD exited cleanly
Mar 13 22:16:59 xx lxd.daemon: => Preparing the system (24575)
Mar 13 22:17:00 xx lxd.daemon: => Starting LXCFS
Mar 13 22:17:01 xx lxd.daemon: => Starting LXD
Mar 13 22:17:04 xx lxd.daemon: => LXD is ready
Mar 14 09:31:51 xx lxd.daemon: => Stop reason is: snap refresh
Mar 14 09:31:51 xx lxd.daemon: => Stopping LXD
Mar 14 09:31:52 xx lxd.daemon: => LXD exited cleanly
Mar 14 09:31:55 xx lxd.daemon: => Preparing the system (24576)
Mar 14 09:31:56 xx lxd.daemon: => Re-using existing LXCFS
Mar 14 09:31:56 xx lxd.daemon: => Starting LXD
Mar 14 09:31:59 xx lxd.daemon: => LXD is ready
[root@xxx ~]# snap list
Name Version Rev Tracking Publisher Notes
core20 20230207 1828 latest/stable canonical✓ base
core22 20230210 522 latest/stable canonical✓ base
lxd 5.11-5044355 24576 latest/candidate canonical✓ -
snapd 2.58.2 18357 latest/stable canonical✓ snapd
Hm, the only difference is that in the "good case" you have => Starting LXCFS
(which means that LXCFS is starting from scratch), but for "bad case" you have => Re-using existing LXCFS
.
So, my theory is that before 24575 rev (which is core22 based) you have something core20 based and it means that after update liblxcfs.so changed the glibc dependency version. I'm not sure which LXD snap revision was first with core22
.
cc @stgraber
switching to core22 might indeed be something.
Still no difference on that level when compairing the good and the bad. But it probably depends on what the trigger was to restart lxcfs or try to re-use it.
bad
Mar 12 10:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 17:28:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 22:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 00:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 06:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 12:58:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 21:53:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "snapd"
Mar 14 03:08:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "lxd", "snapd"
Mar 14 06:33:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "snapd"
good
Mar 12 04:51:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 10:36:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 17:36:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 20:31:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 00:56:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 09:06:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 12:46:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 22:16:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "snapd"
Mar 14 03:26:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "lxd", "snapd"
Mar 14 09:31:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "snapd"
I also have a host node using the lxd stable channel, which does not seem to have been offerred this core22.
snap list
Name Version Rev Tracking Publisher Notes
core20 20230207 1828 latest/stable canonical✓ base
lxd 5.11-ad0b61e 24483 latest/stable canonical✓ -
snapd 2.58.2 18357 latest/stable canonical✓ snapd
Mar 12 06:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 15:43:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 22:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 04:38:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 10:28:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 16:08:15 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 21:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 14 03:38:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 14 10:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Yeah, it's known that the switch to core22 comes with a high risk of a lxcfs crash.
The final snap logic which will be pushed to stable alongside LXD 5.12 will bypass the lxcfs restart.
Sorry, one final question. So I switched back to lxd stable channel... do I need to do more? to for instance get rid of core22?
Sorry, one final question. So I switched back to lxd stable channel... do I need to do more? to for instance get rid of core22?
you don't need to get rid of core22. It's normal transition. You'll meet problems like that only one time. Then everything will work flawlessly.
Users of stable channel will get the same problem.
Hi,
We are running;
snapd-2.55.2 lxd-5.11
Today we found some instances to have all their container without /proc mounted.
When comparing the instances in this state to other that are fine, we see this difference (Failed to open liblxcfs.so) in /var/log/messages;
Mar 13 20:41:17 xx lxd.daemon: => Stop reason is: snap refresh Mar 13 20:41:17 xx lxd.daemon: => Stopping LXD Mar 13 20:41:17 xx lxd.daemon: => LXD exited cleanly Mar 13 20:41:20 xx lxd.daemon: => Preparing the system (24575) Mar 13 20:41:21 xx lxd.daemon: => Re-using existing LXCFS Mar 13 20:41:21 xx lxd.daemon: => Starting LXD Mar 13 20:41:24 xx lxd.daemon: => LXD is ready Mar 13 20:41:30 xx lxd.daemon: Closed liblxcfs.so Mar 13 20:41:30 xx lxd.daemon: /lib/x86_64-linux-gnu/lxcfs/liblxcfs.so: cannot open shared object file: No such file or directory - Failed to open liblxcfs.so Mar 14 05:56:14 xx lxd.daemon: => Stop reason is: snap refresh Mar 14 05:56:14 xx lxd.daemon: => Stopping LXD Mar 14 05:56:14 xx lxd.daemon: => LXD exited cleanly Mar 14 05:56:17 xx lxd.daemon: => Preparing the system (24576) Mar 14 05:56:17 xx lxd.daemon: => Starting LXCFS Mar 14 05:56:18 xx lxd.daemon: => Starting LXD Mar 14 05:56:21 xx lxd.daemon: => LXD is ready
Any idea what could have caused this?
Restarting the containers, solves the missing /proc mount
Gr, J