Open jibel opened 4 years ago
I have what appears to be the same or a very similar issue. I have a dataset containing my nix store mounted at /nix/, and nix bind mounts /nix/store to itself to make it read-only (so there are indeed multiple mounts, but no namespaces or containers involved to my knowledge). When I access a snapshot of the nix store I get the same "Too many levels of symbolic links" ELOOP error. What I can add to the above bug report is that the snapshot is in fact automounted, but at the wrong path: it appears in /.zfs/snapshot instead of /nix/.zfs/snapshot! So if you then go looking for the same-named snapshot of the root dataset, you actually get the wrong data:
## /nix/ is a dataset that is mounted at /nix and apparently the nix package manager does something like `mount --bind -o ro /nix/store /nix/store` to make that directory read only
# mount | grep nix
ssdpool/main/nix on /nix type zfs (rw,relatime,xattr,posixacl)
ssdpool/main/nix on /nix/store type zfs (ro,relatime,xattr,posixacl)
## Attempting to look at a snapshot of /nix/ fails:
# ls /nix/.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001/
ls: cannot access '/nix/.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001/': Too many levels of symbolic links
## But actually it has been mounted, just at a completely wrong path
# mount | grep \\.zfs
ssdpool/main/nix@zcloudbackup-000000-2020-02-17-1001 on /.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001 type zfs (ro,relatime,xattr,posixacl)
## And sure enough looking at that path shows a snapshot of /nix/, not a snapshot of /!
# ls /.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001/
store var
## It should look like this:
# umount /.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001
# ls /.zfs/snapshot/zcloudbackup-000000-2020-02-17-1001
bin boot dev etc home mnt nix proc root run srv sys tmp usr var
I have not, however, been able to reproduce this issue by following jibel's reproduction steps above, or come up with any similarly clean reproduction steps with a new dataset. It's even possible that the bind mount is a red herring. But I think the fact that the mount happens at the wrong path might be a pretty useful clue to identifying the bug.
Type | Version/Name |
---|---|
Distribution Name | NixOS |
Distribution Version | 20.03.git.a21c2fa (Markhor) |
Linux Kernel | 5.5.0 |
Architecture | x86_64 |
ZFS Version | 0.8.3-1 |
SPL Version | 0.8.3-1 |
We're seeing this issue as well with bind mounts happening inside containers (0.8.4-1)
Type | Version/Name |
---|---|
Distribution Name | openSUSE |
Distribution Version | Tumbleweed |
Linux Kernel | 5.9.14-1-default |
Architecture | x86_64 |
ZFS Version | 2.0.0-1 |
SPL Version | 2.0.0-1 |
I'm seeing the same issue in a openSUSE Tumbleweed VM.
To me it looks like a mounted snapshot cannot be accessed via the .zfs/snapshot
directory and vice-versa, once .zfs/snapshot
is accessed, you cannot mount a snapshot. See the following steps for more details:
Too many levels of symbolic links
when listing a mounted snapshotmount
command/tank/.zfs/snapshot
Here is the output
tumbleweed:~ # mount -t zfs tank@second /mnt/tank/second
tumbleweed:~ # ls -al /tank/.zfs/snapshot/
ls: cannot access '/tank/.zfs/snapshot/second': Too many levels of symbolic links
total 1
drwxrwxrwx 2 root root 2 Dec 22 12:03 .
drwxrwxrwx 1 root root 0 Dec 22 12:04 ..
drwxr-xr-x 2 root root 4 Dec 22 11:24 initial
d????????? ? ? ? ? ? second
drwxr-xr-x 2 root root 2 Dec 22 11:27 third
After a umount /mnt/tank/second
the listing ls -al /tank/.zfs/snapshot
works as expected.
.zfs/snapshot
has been accessed before/tank/.zfs/snapshot
mount -t zfs ...
And here is the output
tumbleweed:~ # cd /tank/.zfs/snapshot/
tumbleweed:/tank/.zfs/snapshot # ls -al
total 2
drwxrwxrwx 2 root root 2 Dec 22 11:52 .
drwxrwxrwx 1 root root 0 Dec 22 11:52 ..
drwxr-xr-x 2 root root 4 Dec 22 11:24 initial
drwxr-xr-x 2 root root 3 Dec 22 11:27 second
drwxr-xr-x 2 root root 2 Dec 22 11:27 third
tumbleweed:/tank/.zfs/snapshot # mount -t zfs tank@second /mnt/tank/second
filesystem 'tank@second' is already mounted
Are we just doing something wrong here or is this a bug?
Still happening on Debian 11 using zfsnap:
root@batman:/var/lib/mysql/.zfs/snapshot# ls
ls: cannot access '2022-08-09_06.00.01--7d': Too many levels of symbolic links
ls: cannot access '2022-08-11_19.00.01--7d': Too many levels of symbolic links
ls: cannot access '2022-08-10_18.00.01--7d': Too many levels of symbolic links
ls: cannot access '2022-08-12_23.00.02--7d': Too many levels of symbolic links
^C
root@batman:/var/lib/mysql/.zfs/snapshot# uname -a
Linux batman 5.10.0-16-amd64 #1 SMP Debian 5.10.127-2 (2022-07-23) x86_64 GNU/Linux
root@batman:/var/lib/mysql/.zfs/snapshot# zfs version
zfs-2.1.5-1~bpo11+1
zfs-kmod-2.1.5-1~bpo11+1
root@batman:/var/lib/mysql/.zfs/snapshot# crontab -l
...
0 * * * * /usr/sbin/zfSnap -a 7d -r zcave
root@batman:/var/lib/mysql/.zfs/snapshot#
Just a thought, the snapshots are separate (hidden) mountpoints that get mounted on demand, so maybe you need a recursive bind mount? And I'm not sure if the "on demand" part would work through the bind mount if it's not already mounted. That seems like it would be more of a Linux issue than a ZFS issue. What I don't expect to work is for ZFS to perform the snapshot mounts inside of the bind instead of the actual mount. I'm pretty sure there are checks preventing it.
Also noticed it doesn't happen on the root mountpoint:
root@batman:~# cd /.zfs/snapshot
root@batman:/.zfs/snapshot# ls
2022-08-02_00.01.01--1m 2022-08-10_12.00.01--7d
2022-08-03_00.01.01--1m 2022-08-10_13.00.01--7d
2022-08-04_00.01.01--1m 2022-08-10_14.00.01--7d
2022-08-05_00.01.01--1m 2022-08-10_15.00.01--7d
2022-08-06_00.01.01--1m 2022-08-10_16.00.01--7d
2022-08-07_00.01.01--1m 2022-08-10_17.00.01--7d
2022-08-08_00.01.01--1m 2022-08-10_18.00.01--7d
2022-08-09_00.01.01--1m 2022-08-10_19.00.01--7d
2022-08-10_00.01.01--1m 2022-08-10_20.00.01--7d
2022-08-10_01.00.01--7d 2022-08-10_21.00.01--7d
2022-08-10_02.00.01--7d 2022-08-10_22.00.01--7d
2022-08-10_03.00.01--7d 2022-08-10_23.00.01--7d
2022-08-10_04.00.01--7d 2022-08-11_00.00.01--7d
2022-08-10_05.00.01--7d 2022-08-11_00.01.01--1m
2022-08-10_06.00.01--7d 2022-08-11_01.00.01--7d
2022-08-10_07.00.01--7d 2022-08-11_02.00.01--7d
2022-08-10_08.00.01--7d 2022-08-11_03.00.01--7d
2022-08-10_09.00.01--7d 2022-08-11_04.00.01--7d
2022-08-10_10.00.01--7d 2022-08-11_05.00.01--7d
2022-08-10_11.00.01--7d 2022-08-11_06.00.01--7d
root@batman:/.zfs/snapshot#
root@batman:/.zfs/snapshot# cd /home/.zfs/snapshot
root@batman:/home/.zfs/snapshot# ls
ls: cannot access '2022-08-14_22.00.01--7d': Too many levels of symbolic links
ls: cannot access '2022-08-16_21.00.01--7d': Too many levels of symbolic links
ls: cannot access '2022-08-15_23.00.01--7d': Too many levels of symbolic links
^C
root@batman:/home/.zfs/snapshot#
root@batman:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zcave 713G 352G 27.5G /
zcave/home 588G 352G 462G /home
...
Huh… how did that sudo mount -o zfsutil -t zfs tpool/a /tmp/mnt2
succeed even back on 0.8.3?
Looking at earlier code even, it should've hit the same "owning" codepath that results in EBUSY. (Unfortunate as I'd like to mount the same snapshot in many places instead of using bind mounts but that seems to not be allowed… but it was unintentionally "allowed" due to a bug in the past??)
System information
Describe the problem you're observing
When a dataset is mounted several times, the content of the snapshot cannot be accessed anymore through their .zfs/snapshot/ path.
Trying to perform a filesystem operation in the snapshot results in the following error: Too many levels of symbolic links
Our use case is to generate a grub menu with the history of all the snapshots and allow a user to revert to any version from the menu.
This issue is similar to https://github.com/zfsonlinux/zfs/issues/9479
Describe how to reproduce the problem
Create a test pool with a dataset as follow:
Create a snapshot of the dataset and list the content of the snapshot
The content can be listed successfully
Bind mount the dataset on another mount point and list the content of the snapshot
Accessing the content of the snapshot fails
Accessing the content of the snapshot fails.