ERROR couldn't connect to zsys daemon: timed out waiting for server handshake

farcaller commented 3 years ago

Describe the bug

zsysctl commands fail, e.g.

# zsysctl show
ERROR couldn't connect to zsys daemon: timed out waiting for server handshake

Interestingly enough, after the server is "primed" by e.g. grpcurl, zsysctl seems to work:

# zsysctl list
ERROR couldn't connect to zsys daemon: timed out waiting for server handshake
# time grpcurl -proto zsys.proto -unix -v -plaintext -connect-timeout 1000 -H 'requesterid: 0' -H 'loglevel: 3' /run/zsysd.sock zsys.Zsys.Version

Resolved method descriptor:
rpc Version ( .zsys.Empty ) returns ( stream .zsys.VersionResponse );

Request metadata to send:
loglevel: 3
requesterid: 0

Response headers received:
content-type: application/grpc
requestid: 0:431b12a0

Response contents:
{
  "log": "."
}

Response contents:
{
  "log": "."
}

Response contents:
{
  "log": "."
}

Response contents:
{
  "version": "0.4.8"
}

Response trailers received:
(empty)
Sent 0 requests and received 4 responses

real    0m2.388s
user    0m0.029s
sys     0m0.007s
# zsysctl list
ID                        ZSys  Last Used
--                        ----  ---------
rpool/ROOT/ubuntu_j6h7lo  true  current

To Reproduce

Having non-trivial zfs volumes (e.g. via containerd) seems to help:

# zfs list|wc -l
1168

Expected behavior

zsysctl should work, even if slowly

For ubuntu users, please run and copy the following:

the log isn't trivially short, pasted in here

Screenshots If applicable, add screenshots to help explain your problem.

Installed versions:

OS: Ubuntu 20.04.2 LTS
- Zsysd running version: 0.4.8

Additional context Add any other context about the problem here.

taisph commented 3 years ago

I'm getting frequent timeouts too during especially during apt package changes.

I'm using ZFS for Docker which results in a large number of volumes as well.

# zfs list|wc -l
6118

TheGrave commented 2 years ago

Same errors for me. On top of this zsys-gc fails constantly:

~$ sudo systemctl status zsys-gc ● zsys-gc.service - Clean up old snapshots to free space Loaded: loaded (/lib/systemd/system/zsys-gc.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2022-06-08 11:03:55 CEST; 10h ago TriggeredBy: ● zsys-gc.timer Process: 1374531 ExecStart=/sbin/zsysctl service gc (code=exited, status=1/FAILURE) Main PID: 1374531 (code=exited, status=1/FAILURE)

Jun 08 11:03:35 zfs-backup-host systemd[1]: Starting Clean up old snapshots to free space... Jun 08 11:03:55 zfs-backup-host zsysctl[1374531]: level=error msg="couldn't connect to zsys daemon: timed out waiting for server h> Jun 08 11:03:55 zfs-backup-host systemd[1]: zsys-gc.service: Main process exited, code=exited, status=1/FAILURE Jun 08 11:03:55 zfs-backup-host systemd[1]: zsys-gc.service: Failed with result 'exit-code'. Jun 08 11:03:55 zfs-backup-host systemd[1]: Failed to start Clean up old snapshots to free space.

I'm sure it's the same for you guys, you probably haven't noticed yet.

Got a weird feeling it might be related to a large amount of snaps I have:

$ zfs list -t snapshot | wc -l 12398

Most of these are not on rpool/bpool but an external drive so not sure if it's related. System ones are only:

$ zfs list -t snapshot | grep -v backup | wc -l 391

As far as I understand zsys shouldn't be messing with snaps of non-system-related datasets but maybe service crashes while waiting for some output?

64knl commented 11 months ago

I have the same issue, also large number of snaps:

zfs list -t snapshot | wc -l
16863

TheGrave commented 11 months ago

The workaround I use is:

sudo ./zfs-prune-snapshots -R -v 1M

This wipes all snaps older than 1 month. Daemon works fine after this cleanup.

Lockszmith-GH commented 11 months ago

The workaround I use is:

sudo ./zfs-prune-snapshots -R -v 1M

This wipes all snaps older than 1 month. Daemon works fine after this cleanup.

Is this what you are using?

TheGrave commented 11 months ago

Yep

ReSearchITEng commented 5 months ago

Thanks @TheGrave for sharing zfs-prune-snapshots. Personally I first delete using docker commands:

docker system prune -a -f --volumes

and afterwards using zfs commands. clean zfs snapshots.md

xnox commented 5 months ago

@awhitcroft please locate somebody to subscribe and respond to zsys things

ubuntu / zsys

ERROR couldn't connect to zsys daemon: timed out waiting for server handshake #193