Closed seletskiy closed 10 years ago
It's a bit of a shot in the dark, but try this:
# grep zroot/2013-10-15T065955229209 /proc/*/mounts
This searches for other instances of the mount in other namespaces. I've had this with LXC and since you have such a new kernel I assume you're running an OS that might use it as well.
We had something similar to this, smal chance, check
zfs holds $snapshotname
to see if it has any holds, and if so, zfs release to remove the hold
@DeHackEd: Oh, man, thank you! That's the issue! It's grabbed by ntpd... Yep, we use LXC as well.
Hmmm, it's quite interesting. For whatever reason after umounting any zfs filesystem it still presents in /proc/
# zfs create zroot/test
# zfs list | grep zroot/test
zroot/test 30K 108G 30K /test
# systemctl status ntpd
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since Thu 2013-10-24 12:07:42 UTC; 4s ago
Process: 17371 ExecStart=/usr/bin/ntpd -g -u ntp:ntp (code=exited, status=0/SUCCESS)
Main PID: 17372 (ntpd)
CGroup: name=systemd:/system/ntpd.service
└─17372 /usr/bin/ntpd -g -u ntp:ntp
# grep zroot/test /proc/17372/mounts
zroot/test /test zfs rw,relatime,xattr 0 0
# zfs umount zroot/test
# grep zroot/test /proc/17372/mounts
zroot/test /test zfs rw,relatime,xattr 0 0
# grep zroot/test /proc/*/mounts
/proc/17372/mounts:zroot/test /test zfs rw,relatime,xattr 0 0 <-- it's still only here, WTF?
# zfs destroy zroot/test
cannot destroy 'zroot/test': dataset is busy
# systemctl stop ntpd
# zfs destroy zroot/test && echo $?
0
I doesn't make any sense.
There's a linux feature called mount namespaces. You can make one by running unshare -m /bin/sh
- in this shell all actions performed by mount and umount are independent of the main host. Inside this container you can unmount /home even if you're logged in elsewhere. But likewise if /home gets unmounted on the main system within the namespace it's still mounted since they're independent.
It's a sandboxing technique. LXC uses this but it's not strictly an LXC feature. Unfortunately the ZFS tools can't tell this has happened and says it's unmounted because /etc/mtab (and even /proc/mounts) says all is well. I guess ntpd is doing this. It's a systemd feature so maybe that's at play as well.
Further investigations revealed, that it's looks like bug in systemd. It's repeatable even on loop mount with ext4. It appears when running ntpd because of PrivateTmp=true
in corresponding systemd unit.
So, I'm closing the issue, I've think it's not ZFS bug.
@DeHackEd Thanks for assistance. I've filed a bug if you're intersted: https://bugs.freedesktop.org/show_bug.cgi?id=70856
Just encounter this issue again, but now there is no proc that holds mount.
# zfs rename zroot/virtubuntu-sphinxed zroot/test
cannot rename 'zroot/virtubuntu-sphinxed': dataset is busy
# grep zroot/virtubuntu-sphinxed /proc/*/mounts
... nothing ...
@seletskiy This may be a duplicate of #1792. A fix for this, 7ec09286b761ee1fb85178ff55daaf8f74d935be, was merged in to master a few days ago, could you try the latest code.
@seletskiy Can you still reproduce this in master? We believe it was fixed, unless I hear otherwise I'll close this one out in a few days.
@behlendorf: Looks like issue is fixed. Thanks a lot!
@behlendorf is master
(or what will become 0.6.3
) effectively what can be found in Ubuntu Daily Builds?
Or would I have to build from source to get this fix?
@RLovelett, the Trusty daily builds are tracking master and are current. The packages in ppa:zfs-native/daily for Precise are stale, but they do have this particular fix.
Whatever fix was done here isn't enough for my case.
I ran timedatectl set-ntp true
and then started running into this error on debian jessie with docker 17.06.1-ce. Now when I try to remove a dataset (when docker+zfs cleans up old containers) it errors. Following it with grep led me to systemd. Turning set-ntp to false works, but my clock is drifting so I would prefer not to do that. I'm not sure what the repercussions of changing PrivateTmp=true
to something else are or even what file to change to do that. Any suggestions? I read the systemd bug report and they closed this as "not a bug." I think the fix might need to be in docker somewhere but am not sure about this.
Steps to reproduce:
On a debian Jessie host with docker 17.06.1-ce installed, run timedatectl set-ntp true
Create a docker-compose.yml:
version: '3'
volumes: data:
services: test: image: alpine command: tail -f /dev/null restart: always volumes:
Run docker-compose up -d
and everything should work fine.
Now modify the docker-compose.yml so compose will want to restart it:
version: '3'
services: test: image: alpine command: tail -f /dev/null restart: always
5. Run `docker-compose up -d` again and it will fail the same way the original report showed:
$ docker-compose up -d Recreating quick_test_1 ... Recreating quick_test_1 ... error
ERROR: for quick_test_1 driver "zfs" failed to remove root filesystem for 5f310a1d949f02084468a58142b8f00a70c7dce612076e188c3ba47d28dca737: exit status 1: "/sbin/zfs zfs destroy -r storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0" => cannot destroy 'storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0': dataset is busy
ERROR: for test driver "zfs" failed to remove root filesystem for 5f310a1d949f02084468a58142b8f00a70c7dce612076e188c3ba47d28dca737: exit status 1: "/sbin/zfs zfs destroy -r storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0" => cannot destroy 'storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0': dataset is busy
/proc/919/mounts:storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0 /var/lib/docker/zfs/graph/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0 zfs rw,relatime,xattr,noacl 0 0
PID TTY TIME CMD 919 ? 00:00:00 systemd-timesyn
Let me know if I should open this task for the docker team instead. For now I'll disable ntp.
It seems any processes may keep datasets busy if mounted dataset is accessed somehow.
I have similar issue with docker and nginx on Fedora 27.
Driver zfs failed to remove root filesystem 90b71c4c1173f32913ea0cad5dfa280aae35881e493cc078ff5227e4df4ee016: exit status 1:
"/usr/sbin/zfs zfs destroy -r datastore/docker/8a478425d72dc41b95ef470a8b9227ea342dfc82152c14f2d592d65793bf2030"
=> cannot destroy 'datastore/docker/8a478425d72dc41b95ef470a8b9227ea342dfc82152c14f2d592d65793bf2030':
dataset is busy
grep -n 8a478425d72dc41b95ef470a /proc/*/mounts
shows nginx processes on hosts which are working as reverse proxy for web app containers. The nginx only uses exposed ports, no container volumes at all.
Stopping nginx allows to remove busy datasets.
BTW. timedatectl set-ntp true|false
did not cause any issue on my system.
what I do to fix this is to re-mount the problematic dataset again. no reboot. and yes, re-mount by zfs mount
.
note: we use several old versions of zfs and unable to upgrade all of our machines, posting this here for others.
For me the problem was that the parent dataset had canmount=on
. So doing:
zfs set canmount=off pool-0/parent
allowed me to run succesfully:
zfs destroy pool-0/parent/child
I've see similar issues, but all of them were closed.
I'm experiencing such kind of problem right now:
It's quite annoying, because of I need to constantly reboot host to get it to work.
I can provide any debug information needed.