ubuntu / zsys

ZSys daemon and client for zfs systems
GNU General Public License v3.0
302 stars 43 forks source link

Unclear what to do after reverting state #185

Open josephtate opened 3 years ago

josephtate commented 3 years ago

Describe the bug Perhaps this is a documentation issue, but it's unclear what the admin needs to do after booting grub from an old snapshot to keep their system working smoothly.

I tried to install a Real Time kernel to do some Ubuntu Studio work, but that was unable to load my zfs pools. So I reverted. Now I have two sets of zfs snapshots, and worse still, several zfs and zsys services don't work, zsys boot-prepare seg faults, and I don't have confidence in the system anymore.

To Reproduce Steps to reproduce the behavior:

  1. Install Ubuntu + ZFS root
  2. Download the ubuntu studio installer and install the real time kernel
  3. Reboot, see zfs failure
  4. Reboot and rollback via the grub menu to the previous snapshot
  5. What do I do next?

Expected behavior I was expecting for there to be some sort of zsys permanesce command that would roll back the system zfs states to the current clone and delete the original. Something that would run zfs promote, for example and delete the other branch.

For ubuntu users, please run and copy the following:

  1. ubuntu-bug zsys --save=/tmp/report
  2. Copy paste below /tmp/report content: I was unable to generate the report as directed:
    
    $ sudo ubuntu-bug zsys --save=/tmp/report

*** Collecting problem information

The collected information can be sent to the developers to improve the application. This might take a few minutes. .......

*** Problem in zsys

The problem cannot be reported:

This is not an official KDE package. Please remove any third party package and try again.

Press any key to continue...

No pending crash reports. Try --help for more information.


**Screenshots**
If applicable, add screenshots to help explain your problem.

**Installed versions:**
 - OS:

$ cat /etc/os-release NAME="KDE neon Plasma LTS" VERSION="5.18" ID=neon ID_LIKE="ubuntu debian" PRETTY_NAME="KDE neon Plasma LTS Edition 5.18" VARIANT="Plasma LTS Edition" VERSION_ID="20.04" HOME_URL="https://neon.kde.org/" SUPPORT_URL="https://neon.kde.org/" BUG_REPORT_URL="https://bugs.kde.org/" LOGO=start-here-kde-neon PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

- Zsysd running version: zsysctl 0.4.8

**Additional context**
Add any other context about the problem here.
Mount shows the following datasets loaded: 

rpool/ROOT/ubuntu_r20rzf on / type zfs (rw,relatime,xattr,posixacl) rpool/USERDATA/username_aqwu6c on /home/jtate type zfs (rw,relatime,xattr,posixacl) rpool/USERDATA/root_03fo29tr on /root type zfs (rw,relatime,xattr,posixacl) bpool/BOOT/ubuntu_r20rzf on /boot type zfs (rw,nodev,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/games on /var/games type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/www on /var/www type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/log on /var/log type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib on /var/lib type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/usr/local on /usr/local type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/snap on /var/snap type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/spool on /var/spool type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/srv on /srv type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/mail on /var/mail type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib/dpkg on /var/lib/dpkg type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib/NetworkManager on /var/lib/NetworkManager type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib/AccountsService on /var/lib/AccountsService type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib/apt on /var/lib/apt type zfs (rw,relatime,xattr,posixacl) rpool/ROOT/ubuntu_r20rzf/var/lib/0b3174a11e50edb014a03ca2efa4fddfa481f781a2ff233c785668d42c3dac72 on /var/lib/docker/zfs/graph/0b3174a11e50edb014a03ca2efa4fddfa481f781a2ff233c785668d42c3dac72 type zfs (rw,relatime,xattr,posixacl)

But I had to zfs mount most of those.

$ systemctl status zsys* ● zsys-gc.timer - Clean up old snapshots to free space Loaded: loaded (/lib/systemd/system/zsys-gc.timer; enabled; vendor preset: enabled) Active: active (waiting) since Thu 2021-01-07 00:07:36 EST; 4 days ago Trigger: Tue 2021-01-12 23:09:46 EST; 23h left Triggers: ● zsys-gc.service

Jan 07 00:07:36 denali.int.dragonstrider.com systemd[1]: Started Clean up old snapshots to free space.

● zsysd.socket - Socker activation for zsys daemon Loaded: loaded (/lib/systemd/system/zsysd.socket; enabled; vendor preset: enabled) Active: failed (Result: service-start-limit-hit) since Thu 2021-01-07 00:10:22 EST; 4 days ago Triggers: ● zsysd.service Listen: /run/zsysd.sock (Stream)

Jan 07 00:07:36 denali.int.dragonstrider.com systemd[1]: Listening on Socker activation for zsys daemon. Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: zsysd.socket: Failed with result 'service-start-limit-hit'.

● zsysd.service - ZSYS daemon service Loaded: loaded (/lib/systemd/system/zsysd.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2021-01-07 00:10:22 EST; 4 days ago TriggeredBy: ● zsysd.socket Main PID: 13566 (code=exited, status=2)

Jan 07 00:10:22 denali.int.dragonstrider.com zsysd[13566]: github.com/ubuntu/zsys/vendor/github.com/spf13/cobra.(*Command).Execute(...) Jan 07 00:10:22 denali.int.dragonstrider.com zsysd[13566]: github.com/ubuntu/zsys/vendor/github.com/spf13/cobra/command.go:864 Jan 07 00:10:22 denali.int.dragonstrider.com zsysd[13566]: main.main() Jan 07 00:10:22 denali.int.dragonstrider.com zsysd[13566]: github.com/ubuntu/zsys/cmd/zsysd/main.go:36 +0xdb Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: zsysd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: zsysd.service: Failed with result 'exit-code'. Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: Failed to start ZSYS daemon service. Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: zsysd.service: Start request repeated too quickly. Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: zsysd.service: Failed with result 'exit-code'. Jan 07 00:10:22 denali.int.dragonstrider.com systemd[1]: Failed to start ZSYS daemon service.

● zsys-gc.service - Clean up old snapshots to free space Loaded: loaded (/lib/systemd/system/zsys-gc.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-01-11 23:09:46 EST; 26min ago TriggeredBy: ● zsys-gc.timer Main PID: 2164204 (code=exited, status=1/FAILURE)

Jan 11 23:09:46 denali.int.dragonstrider.com systemd[1]: Starting Clean up old snapshots to free space... Jan 11 23:09:46 denali.int.dragonstrider.com zsysctl[2164204]: level=error msg="couldn't connect to zsys daemon: connection error: desc = \"transport: Error while dialing dial unix /run/zsysd.sock: connect: connection refused\"" Jan 11 23:09:46 denali.int.dragonstrider.com systemd[1]: zsys-gc.service: Main process exited, code=exited, status=1/FAILURE Jan 11 23:09:46 denali.int.dragonstrider.com systemd[1]: zsys-gc.service: Failed with result 'exit-code'. Jan 11 23:09:46 denali.int.dragonstrider.com systemd[1]: Failed to start Clean up old snapshots to free space.

josephtate commented 3 years ago

Well, I think I have fixed my system.

Not all these steps are necessary, but I thought starting from a clean slate would be faster than preserving zsys history or docker images (for example). Hopefully this can help someone.

My rpool/USERDATA/root_ is still the old id, but that doesn't seem to matter.

I rebooted, but zfs-mount service still was failing to come up. zfs mount -a was giving me errors about / not being empty.

BUT I still had problems in systemd: the zysys-commit service was not starting, but the workaround in #112 helped me get that running too.