Zero free space and write failure despite there being room

lucid-dreams commented 8 years ago

According to zpool I've got enough free space:

quicksilver% zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data   931G   911G  20.2G         -    26%    97%  1.00x  ONLINE  -

But zfs disagrees:

quicksilver% zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
data                911G      0   137K  /Volumes/data
data/Media         2.02M      0  2.02M  /Volumes/Media
data/xCollections  53.9G      0  53.9G  /Users/danny/xCollections

  [and so on.. all with 0 AVAIL]

Most write actions fail with disk full error incl. creating or changing files, mkdir and some chmod (e.g. I can't delete ACL, somehow that seems to require space). Delete actions work fine though, however free space does not increase.

Now, this is a pool that I salvaged from a failing HD using a freezer and ddrescue. Before that, OSX sometimes crashed when listing big dirs, usually with IO or USB errors in syslog. This was the reason I mirrored it to a new device. Ddrescue was able to recover 100%, although for a few MB the read speed was very low suggesting bad blocks and every few hours the drive hanged and wasn't recognized despite power cycling until after some quality time in the freezer. The new device is a core storage volume (using parts of 3 physical disks) and therefore it is a few hundred bytes bigger than the source HD (because diskutil wants it divisible by 256), but this doesn't seem to be a problem. I was a bit surprised that after rebooting the new pool was auto-imported without any problems, because the virtual device also contains the GPT and EFI Partition of the source disk and usually core storage only provides /dev/discX containing only the filesystem, but now there is also /dev/diskXs1 and /dev/diskXs2 and ZFS does not even seem to notice that this isn't the old HD any more.

Scrubbing found 4 small files with checksum error, which were not important so I deleted them. zpool status -v still reports them, but now without a name (why?). Everything was fine including writing until after some hours Spotlight somehow managed to fill the disk up (a few gb). After disabling it for these filesystems Spotlight cleaned up its indexes and zpool showed free space again, but writing still fails and zfs shows no AVAIL space. Rebooting, exporting or scrubbing does nothing.

For two of my datasets I found the following (the rest looks normal):

quicksilver% _ zfs userspace data/xCollections
TYPE        NAME    USED  QUOTA
POSIX User  danny  78.1G   none
POSIX User  root   16.0E   none

Neither du nor zfs get logicalused agrees with those 16 exabytes. So there may be a problem with zfs space accounting in edge cases somewhere.

Unfortunately this is now mixed up with the HD recovery, a few kb of bad blocks and exotic core storage usage, so causality is rather unclear. I also upgraded to OSX 10.11.2 and O3X 1.4.5 before ddrescue was done. But even if it is related to bad block damage I still feel that zfs ought to be able to survive that (usually), for I had ext2, HFS+, FAT32 and even reiserfs cope with much larger bad block damage.

(Rant: Maybe this is an example of why it is still a sensible thing to have optional offline fsck. Apart from the psychological satisfaction of fsck, it just does not seem prudent to assume that a live operation such as zfs scrubbing can do as much structural meta data crosschecking and fixing as an offline algorithm could. As it stands, I can't even clear the zfs status -v file checksum errors. Until the end of my (or the pools) days I shall have to put up with this testament of metadata inconsistency.)

rottegift commented 8 years ago

You've put way too much into this problem report.

I'll address your first problem about space here, then your second (checksums) in another comment.

ZFS maintains 3% of pool space as "slop" space, so you really are out of space (zpool space and zfs space are different things).

In ZoL it's a tunable. You would have to rebuild O3X. I STRONGLY advise against you shrinking spa_slop_shift substantially though. Instead of the problems you've already had by virtue of having used up all your non-slop space, you could also wind up with a pool that is wholly unimportable.

       spa_slop_shift (int)
                   Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space in the pool to  be
                   consumed.   This ensures that we don't run the pool completely out of space, due to unac-
                   counted changes (e.g. to the MOS).  It also limits the worst-case time to allocate space.
                   If  we have less than this amount of free space, most ZPL operations (e.g. write, create)
                   will return ENOSPC.

                   Default value: 5

rottegift commented 8 years ago

Scrubbing found 4 small files with checksum error, which were not important so I deleted them. zpool status -v still reports them, but now without a name (why?)

Please supply the output of "zpool status -v" on the pool in question.

Please also supply the output of "zfs -list -r -o space -t all" for the pool in question.

(Alternatively, your now-unnamed files may be fixed in a snapshot, and if you can see that that is the case, then you don't need to supply that output and you can just close the issue.)

lucid-dreams commented 8 years ago

I thought so much, but than why doesn't deleting files free some space (no snapshots btw)?

Maybe, if spa_slop_shift was introduced or raised since O3X 1.3.0 (the version I used before), so that 1.4.5 considers overfilled what to 1.3.0 was acceptable, that might explain it. In that case I would have to delete until zpool FREE is at least at zpool SIZE*0.032?

So, then it would be an update problem unrelated to bad blocks.

NAME               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
data                   0   911G         0    137K              0       911G
data/Media             0  2.02M         0   2.02M              0          0
data/xCollections      0  53.9G         0   53.9G              0          0
data/xDocuments        0   860M         0    860M              0          0
data/xMovies           0   115G         0    115G              0          0
data/xMusic            0   279G         0    279G              0          0
data/xMedia             0   462G         0    462G              0          0

lucid-dreams commented 8 years ago

Regarding checksum errors 2 of the 4 vanished during the last few hours. I got the notion that they'll stay from the URL in zpool output:

  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub canceled on Sun Nov 29 17:56:16 2015
config:

    NAME                                          STATE     READ WRITE CKSUM
    data                                          ONLINE       0     0     0
      media-1D569105-8495-4FFF-A8EB-1DD7CF19AAEA  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        data/xMedia:<0x3172>
        data/xDocuments:<0x4340>

Well, I try deleting enough files now, the problem is, that ACL prevent most from being deleted and I can't change ACL currently. The ACL are accidental, is there a way to turn them off pool or datasetwide?

lucid-dreams commented 8 years ago

Works fine now with 20gb more deleted. There is still the question of how I ended up there, because it should not have been possible to get so much over the 3% limit. And there are those 16 exabytes root is supposedly using.

Above errors are still unclearable, but it does not seem to have any ill effects, so it's probably safe to ignore them. Thanks for your help.

rottegift commented 8 years ago

The errors are in metadata in your zfs datasets. That's worse than bad.

The crucial issue here is that your pool (data) has no redundancy at all. The slightest glitch (on the bus, on the disk, in software, and so forth) can lead to uncorrectable corruption. If you're using USB 2.0 I would bet that those glitches will keep happening.

That you have not yet lost the ability to import pool is pretty lucky; in your shoes I would immediately import it readonly and move all the data to a new pool which has actual redundancy.

You can run "sudo zdb -mcv data" and see if zdb dies. If zdb exits abnormally (you don't have to run it to completion, but you should at least let it run a little while), that means there is a timebomb on your pool where corrupted metadata WILL eventually cause your machine to panic.

zdb is read-only, but you can get read-disturbances on the bus that your pool is in no state to recover from, so zdb can (in principle, and perhaps not very likely) make your no-redundancy/definite-damage pool worse.

chmod defeats ACLs, so you can do a find ... -exec or chmod -R.

However, again, you have corrupt metadata already; writing more metadata can wreck your ability to extract data from the pool.

openzfsonosx / zfs

Zero free space and write failure despite there being room #437