lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.05k stars 251 forks source link

Failed to open liblxcfs.so (snap lxd-5.11) #588

Closed jalbstmeijer closed 1 year ago

jalbstmeijer commented 1 year ago

Hi,

We are running;

snapd-2.55.2 lxd-5.11

Today we found some instances to have all their container without /proc mounted.

When comparing the instances in this state to other that are fine, we see this difference (Failed to open liblxcfs.so) in /var/log/messages;

Mar 13 20:41:17 xx lxd.daemon: => Stop reason is: snap refresh Mar 13 20:41:17 xx lxd.daemon: => Stopping LXD Mar 13 20:41:17 xx lxd.daemon: => LXD exited cleanly Mar 13 20:41:20 xx lxd.daemon: => Preparing the system (24575) Mar 13 20:41:21 xx lxd.daemon: => Re-using existing LXCFS Mar 13 20:41:21 xx lxd.daemon: => Starting LXD Mar 13 20:41:24 xx lxd.daemon: => LXD is ready Mar 13 20:41:30 xx lxd.daemon: Closed liblxcfs.so Mar 13 20:41:30 xx lxd.daemon: /lib/x86_64-linux-gnu/lxcfs/liblxcfs.so: cannot open shared object file: No such file or directory - Failed to open liblxcfs.so Mar 14 05:56:14 xx lxd.daemon: => Stop reason is: snap refresh Mar 14 05:56:14 xx lxd.daemon: => Stopping LXD Mar 14 05:56:14 xx lxd.daemon: => LXD exited cleanly Mar 14 05:56:17 xx lxd.daemon: => Preparing the system (24576) Mar 14 05:56:17 xx lxd.daemon: => Starting LXCFS Mar 14 05:56:18 xx lxd.daemon: => Starting LXD Mar 14 05:56:21 xx lxd.daemon: => LXD is ready

Any idea what could have caused this?

Restarting the containers, solves the missing /proc mount

Gr, J

mihalicyn commented 1 year ago

Hi!

Today we found some instances to have all their container without /proc mounted.

By "instances" you mean host nodes? Because we have an instance term in LXD https://linuxcontainers.org/lxd/docs/latest/explanation/instances/.

Couldn't you show snap list output from these nodes?

jalbstmeijer commented 1 year ago

Hi,

By "instances" you mean host nodes? Because we have an instance term in LXD https://linuxcontainers.org/lxd/docs/latest/explanation/instances/.

You are right, host nodes.

Couldn't you show snap list output from these nodes?

[root@xxx ~]# snap list
Name    Version       Rev    Tracking          Publisher   Notes
core20  20230207      1828   latest/stable     canonical✓  base
core22  20230210      522    latest/stable     canonical✓  base
lxd     5.11-5044355  24576  latest/candidate  canonical✓  -
snapd   2.58.2        18357  latest/stable     canonical✓  snapd
mihalicyn commented 1 year ago

So, you are using latest/candidate channel for LXD snap. Couldn't you check which channel is used on the nodes where problem is not reproducible? My suspicious is that there was some temporary issue with LXD snap build.

jalbstmeijer commented 1 year ago

I'm using latest/candidate on all, but good point maybe I should not. I will switch back to latest/stable. So host nodes which don't show the issue have exact the same current versions. Example host node without issues;

Mar 13 22:16:55 xx lxd.daemon: => Stop reason is: snap refresh
Mar 13 22:16:55 xx lxd.daemon: => Stopping LXD
Mar 13 22:16:56 xx lxd.daemon: => LXD exited cleanly
Mar 13 22:16:59 xx lxd.daemon: => Preparing the system (24575)
Mar 13 22:17:00 xx lxd.daemon: => Starting LXCFS
Mar 13 22:17:01 xx lxd.daemon: => Starting LXD
Mar 13 22:17:04 xx lxd.daemon: => LXD is ready
Mar 14 09:31:51 xx lxd.daemon: => Stop reason is: snap refresh
Mar 14 09:31:51 xx lxd.daemon: => Stopping LXD
Mar 14 09:31:52 xx lxd.daemon: => LXD exited cleanly
Mar 14 09:31:55 xx lxd.daemon: => Preparing the system (24576)
Mar 14 09:31:56 xx lxd.daemon: => Re-using existing LXCFS
Mar 14 09:31:56 xx lxd.daemon: => Starting LXD
Mar 14 09:31:59 xx lxd.daemon: => LXD is ready
[root@xxx ~]# snap list
Name    Version       Rev    Tracking          Publisher   Notes
core20  20230207      1828   latest/stable     canonical✓  base
core22  20230210      522    latest/stable     canonical✓  base
lxd     5.11-5044355  24576  latest/candidate  canonical✓  -
snapd   2.58.2        18357  latest/stable     canonical✓  snapd
mihalicyn commented 1 year ago

Hm, the only difference is that in the "good case" you have => Starting LXCFS (which means that LXCFS is starting from scratch), but for "bad case" you have => Re-using existing LXCFS.

So, my theory is that before 24575 rev (which is core22 based) you have something core20 based and it means that after update liblxcfs.so changed the glibc dependency version. I'm not sure which LXD snap revision was first with core22.

cc @stgraber

jalbstmeijer commented 1 year ago

switching to core22 might indeed be something.

Still no difference on that level when compairing the good and the bad. But it probably depends on what the trigger was to restart lxcfs or try to re-use it.

bad

 Mar 12 10:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 17:28:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 22:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 00:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 06:43:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 12:58:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 21:53:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "snapd"
Mar 14 03:08:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "lxd", "snapd"
Mar 14 06:33:34 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "snapd"

good

Mar 12 04:51:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 10:36:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 17:36:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 20:31:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 00:56:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 09:06:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 12:46:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 22:16:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "snapd"
Mar 14 03:26:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "lxd", "snapd"
Mar 14 09:31:47 xxx snapd: storehelpers.go:721: cannot refresh: snap has no updates available: "core20", "core22", "snapd"

I also have a host node using the lxd stable channel, which does not seem to have been offerred this core22.

snap list
Name    Version       Rev    Tracking       Publisher   Notes
core20  20230207      1828   latest/stable  canonical✓  base
lxd     5.11-ad0b61e  24483  latest/stable  canonical✓  -
snapd   2.58.2        18357  latest/stable  canonical✓  snapd
Mar 12 06:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 15:43:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 12 22:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 04:38:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 10:28:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 16:08:15 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 13 21:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 14 03:38:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Mar 14 10:13:14 xx snapd: storehelpers.go:748: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
stgraber commented 1 year ago

Yeah, it's known that the switch to core22 comes with a high risk of a lxcfs crash.

The final snap logic which will be pushed to stable alongside LXD 5.12 will bypass the lxcfs restart.

jalbstmeijer commented 1 year ago

Sorry, one final question. So I switched back to lxd stable channel... do I need to do more? to for instance get rid of core22?

mihalicyn commented 1 year ago

Sorry, one final question. So I switched back to lxd stable channel... do I need to do more? to for instance get rid of core22?

you don't need to get rid of core22. It's normal transition. You'll meet problems like that only one time. Then everything will work flawlessly.

Users of stable channel will get the same problem.