Closed Vultaire closed 8 months ago
That's a lxcfs crash, moving over to lxcfs.
The schema update thing is caused by snapd potentially having done a rollback for you.
LXD can't actually be rolled back because of DB schema changes, you can only ever go forward in releases.
Running snap refresh lxd
should get you onto the latest revision, taking care of that part.
For the LXCFS crash. We've been adding some debugging logic in the snap to better catch some of those, as well as rolling out a variety of bugfixes with LXD 5.10. The issue with fixing lxcfs issues is that it needs a full restart of LXD (and all containers) to be effective, so it can get a bit tricky to know exactly what version you may be running.
Hi @Vultaire!
Have you got any new reproductions of this issue?
As Stéphane said, we have added some extra debug logic into the LXD snap package and fixed a few issues in the LXCFS.
Let's close for now. Feel free to reopen if issue is still actual and reproducible.
Issue description
Snap refreshes of LXD seem to intermittently trigger loss of /proc endpoints. The mounts still exist from the perspective of the containers, but attempting to access them results in messages such as
ls: cannot access '/proc/stat': Transport endpoint is not connected
.This seems like it may be somewhat of a corner case, although on a cloud I'm supporting I've see this happen on 2 different hosts within roughly the last month.
The environment in question is running LXD on snap channel 5.0/stable on Ubuntu Focal.
Also, one message seems to jump out of the journalctl logs at me:
Error: Error creating database: schema version '43' is more recent than expected '42'
. This seems to trigger the LXD service going into a restart loop.Steps to reproduce
Let snap refresh the LXD snap automatically. (Bug is intermittent and may not readily occur.)
Information to attach
For the most recent occurance which I observed today, I have an NRPE log entry indicating a problem while running an NRPE check:
While not completely a smoking gun, as highly circumstantial evidence I have this from
journalctl -u snap.lxd.daemon.service
:The previous snap refresh occurred at 17:00:37, with its final log message included above. The next refresh, at 21:35, had an error resulting in a restart loop. The timing seems suspiciously close to when the /proc endpoints stopped responding from within the containers.
Required information
The output of "lxc info":