openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.47k stars 1.74k forks source link

cannot destroy dataset: dataset is busy #1810

Closed seletskiy closed 10 years ago

seletskiy commented 10 years ago

I've see similar issues, but all of them were closed.

I'm experiencing such kind of problem right now:

# zfs destroy zroot/2013-10-15T065955229209
cannot destroy 'zroot/2013-10-15T065955229209': dataset is busy

# zfs umount zroot/2013-10-15T065955229209
cannot unmount 'zroot/2013-10-15T065955229209': not currently mounted

# zfs list | grep zroot/2013-10-15T065955229209
zroot/2013-10-15T065955229209                2.86G  25.0G  11.0G  /var/lib/heaver/instances/2013-10-15T065955229209

# umount /var/lib/heaver/instances/2013-10-15T065955229209
umount: /var/lib/heaver/instances/2013-10-15T065955229209: not mounted

# pacman -Qi zfs | grep Version
Version        : 0.6.1_3.9.9-1

# uname -a
Linux hub.host.s 3.9.9-1-apparmor #1 SMP PREEMPT Thu Jul 11 17:45:29 NOVT 2013 x86_64 GNU/Linux

It's quite annoying, because of I need to constantly reboot host to get it to work.

I can provide any debug information needed.

DeHackEd commented 10 years ago

It's a bit of a shot in the dark, but try this:

# grep zroot/2013-10-15T065955229209 /proc/*/mounts

This searches for other instances of the mount in other namespaces. I've had this with LXC and since you have such a new kernel I assume you're running an OS that might use it as well.

lundman commented 10 years ago

We had something similar to this, smal chance, check

zfs holds $snapshotname

to see if it has any holds, and if so, zfs release to remove the hold

seletskiy commented 10 years ago

@DeHackEd: Oh, man, thank you! That's the issue! It's grabbed by ntpd... Yep, we use LXC as well.

seletskiy commented 10 years ago

Hmmm, it's quite interesting. For whatever reason after umounting any zfs filesystem it still presents in /proc//mounts:

# zfs create zroot/test

# zfs list | grep zroot/test
zroot/test                                     30K   108G    30K  /test

# systemctl status ntpd
ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
   Active: active (running) since Thu 2013-10-24 12:07:42 UTC; 4s ago
  Process: 17371 ExecStart=/usr/bin/ntpd -g -u ntp:ntp (code=exited, status=0/SUCCESS)
 Main PID: 17372 (ntpd)
   CGroup: name=systemd:/system/ntpd.service
           └─17372 /usr/bin/ntpd -g -u ntp:ntp

# grep zroot/test /proc/17372/mounts
zroot/test /test zfs rw,relatime,xattr 0 0

# zfs umount zroot/test

# grep zroot/test /proc/17372/mounts
zroot/test /test zfs rw,relatime,xattr 0 0

# grep zroot/test /proc/*/mounts
/proc/17372/mounts:zroot/test /test zfs rw,relatime,xattr 0 0   <-- it's still only here, WTF?

# zfs destroy zroot/test
cannot destroy 'zroot/test': dataset is busy

# systemctl stop ntpd

# zfs destroy zroot/test && echo $?
0

I doesn't make any sense.

DeHackEd commented 10 years ago

There's a linux feature called mount namespaces. You can make one by running unshare -m /bin/sh - in this shell all actions performed by mount and umount are independent of the main host. Inside this container you can unmount /home even if you're logged in elsewhere. But likewise if /home gets unmounted on the main system within the namespace it's still mounted since they're independent.

It's a sandboxing technique. LXC uses this but it's not strictly an LXC feature. Unfortunately the ZFS tools can't tell this has happened and says it's unmounted because /etc/mtab (and even /proc/mounts) says all is well. I guess ntpd is doing this. It's a systemd feature so maybe that's at play as well.

seletskiy commented 10 years ago

Further investigations revealed, that it's looks like bug in systemd. It's repeatable even on loop mount with ext4. It appears when running ntpd because of PrivateTmp=true in corresponding systemd unit.

So, I'm closing the issue, I've think it's not ZFS bug.

seletskiy commented 10 years ago

@DeHackEd Thanks for assistance. I've filed a bug if you're intersted: https://bugs.freedesktop.org/show_bug.cgi?id=70856

seletskiy commented 10 years ago

Just encounter this issue again, but now there is no proc that holds mount.

# zfs rename zroot/virtubuntu-sphinxed zroot/test  
cannot rename 'zroot/virtubuntu-sphinxed': dataset is busy

# grep zroot/virtubuntu-sphinxed /proc/*/mounts
... nothing ...
behlendorf commented 10 years ago

@seletskiy This may be a duplicate of #1792. A fix for this, 7ec09286b761ee1fb85178ff55daaf8f74d935be, was merged in to master a few days ago, could you try the latest code.

behlendorf commented 10 years ago

@seletskiy Can you still reproduce this in master? We believe it was fixed, unless I hear otherwise I'll close this one out in a few days.

seletskiy commented 10 years ago

@behlendorf: Looks like issue is fixed. Thanks a lot!

RLovelett commented 10 years ago

@behlendorf is master (or what will become 0.6.3) effectively what can be found in Ubuntu Daily Builds?

Or would I have to build from source to get this fix?

dajhorn commented 10 years ago

@RLovelett, the Trusty daily builds are tracking master and are current. The packages in ppa:zfs-native/daily for Precise are stale, but they do have this particular fix.

BlinkyStitt commented 7 years ago

Whatever fix was done here isn't enough for my case.

I ran timedatectl set-ntp true and then started running into this error on debian jessie with docker 17.06.1-ce. Now when I try to remove a dataset (when docker+zfs cleans up old containers) it errors. Following it with grep led me to systemd. Turning set-ntp to false works, but my clock is drifting so I would prefer not to do that. I'm not sure what the repercussions of changing PrivateTmp=true to something else are or even what file to change to do that. Any suggestions? I read the systemd bug report and they closed this as "not a bug." I think the fix might need to be in docker somewhere but am not sure about this.

Steps to reproduce:

  1. On a debian Jessie host with docker 17.06.1-ce installed, run timedatectl set-ntp true

  2. Create a docker-compose.yml:

    
    version: '3'

volumes: data:

services: test: image: alpine command: tail -f /dev/null restart: always volumes:

  1. Run docker-compose up -d and everything should work fine.

  2. Now modify the docker-compose.yml so compose will want to restart it:

    
    version: '3'

services: test: image: alpine command: tail -f /dev/null restart: always


5. Run `docker-compose up -d` again and it will fail the same way the original report showed:

$ docker-compose up -d Recreating quick_test_1 ... Recreating quick_test_1 ... error

ERROR: for quick_test_1 driver "zfs" failed to remove root filesystem for 5f310a1d949f02084468a58142b8f00a70c7dce612076e188c3ba47d28dca737: exit status 1: "/sbin/zfs zfs destroy -r storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0" => cannot destroy 'storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0': dataset is busy

ERROR: for test driver "zfs" failed to remove root filesystem for 5f310a1d949f02084468a58142b8f00a70c7dce612076e188c3ba47d28dca737: exit status 1: "/sbin/zfs zfs destroy -r storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0" => cannot destroy 'storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0': dataset is busy

grep storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0 /proc/*/mounts

/proc/919/mounts:storage/docker/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0 /var/lib/docker/zfs/graph/01689fe6f409b0486a136023b150f4436d7d1a83bb9102d76970aa4e11cd82e0 zfs rw,relatime,xattr,noacl 0 0

ps -p 919

PID TTY TIME CMD 919 ? 00:00:00 systemd-timesyn



Let me know if I should open this task for the docker team instead. For now I'll disable ntp.
yohgaki commented 6 years ago

It seems any processes may keep datasets busy if mounted dataset is accessed somehow.

I have similar issue with docker and nginx on Fedora 27.

Driver zfs failed to remove root filesystem 90b71c4c1173f32913ea0cad5dfa280aae35881e493cc078ff5227e4df4ee016: exit status 1: 
"/usr/sbin/zfs zfs destroy -r datastore/docker/8a478425d72dc41b95ef470a8b9227ea342dfc82152c14f2d592d65793bf2030" 
=> cannot destroy 'datastore/docker/8a478425d72dc41b95ef470a8b9227ea342dfc82152c14f2d592d65793bf2030': 
dataset is busy

grep -n 8a478425d72dc41b95ef470a /proc/*/mounts shows nginx processes on hosts which are working as reverse proxy for web app containers. The nginx only uses exposed ports, no container volumes at all.

Stopping nginx allows to remove busy datasets.

BTW. timedatectl set-ntp true|false did not cause any issue on my system.

mgbii commented 5 years ago

what I do to fix this is to re-mount the problematic dataset again. no reboot. and yes, re-mount by zfs mount.

note: we use several old versions of zfs and unable to upgrade all of our machines, posting this here for others.

heppu commented 1 year ago

For me the problem was that the parent dataset had canmount=on. So doing:

zfs set canmount=off pool-0/parent

allowed me to run succesfully:

zfs destroy pool-0/parent/child