Closed project0 closed 2 weeks ago
Can you show ls -lh /var/lib/incus/database/global/
on a broken system?
We haven't heard of a database corruption case in over a year (including LXD in that too), so having it be somewhat reproducible is very very odd. In any case, recovery should normally be as easy as removing the most recent segment file, there should never be a situation where the entire database is unreadable as dqlite works by building up segments, making it trivial to revert, then doing new baseline snapshots every so often and keeping around multiple snapshots.
So the very few times where corruption happens, removing the most recent snapshot should get you back online and only cause a few minutes of database data loss (if any). If somehow you got all your segments to be corrupted, deleting them will have dqlite go from the latest snapshot. If that snapshot is corrupted, you can delete it and have it start from the previous snapshot.
Having the database be completely dead would require all segment files and all snapshots to be corrupted, at which point you likely have bigger problems with your system :)
@freeekanayaka
It just happened again (sorry for the late response, but i am using the machine occassionally). I am not sure what is happening in my case :-(,
I have tried to delete the most recent segment files, it did work for a short amount of time until i tried to create a new VM and it crashed while selecting the image (via incus-UI). Second try with deleting more segment files did seem to work, but its by far no solution as i have now a inconsistent state (vm is not existing anymore, but is still configured and cannot be deleted).
However, here is a state of the broken database files: db.zip
I think i found the problem. Apparently incus is dynamically linked in archlinux to cowsql and the packages is not updated to recent version :-( (therefore it still hints to src/vfs.c:802).
i guess we can close it, still bad the bug fix depends on OS package distribution. :-(
Required information
Issue description
Similar issue to #665, see debug log. It happened now the second time to me.
Only fix i could find is wiping the full sqlite database...
Steps to reproduce
I am not sure how to reproduce it, it seem to happen randomly without after creating or modifying any profile or VM/Container.
Information to attach
dmesg
) No kernel error messageincus info NAME --show-log
)incus config show NAME --expanded
)incus monitor --pretty
while reproducing the issue) Debug Log:Details