openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

fstorture + extend trigger "zfs: accessing past end of object" #32

Closed lundman closed 11 years ago

lundman commented 11 years ago
 ./fstorture /BOOM/one /BOOM/two 1
Capabilities of /BOOM/one (on /BOOM): softlinks?Y hardlinks?Y ACLs?N
Capabilities of /BOOM/two (on /BOOM): softlinks?Y hardlinks?Y ACLs?N
Test started   [fstorture 2.1-pfh]: Mon Jun 17 09:44:03 JST 2013
panic(cpu 1 caller 0xffffff7f82bde579): "zfs: accessing past end of object 15/1d (size=28160 access=24136+21111)"@spl-err.c:48
Backtrace (CPU 1), Frame : Return Address
0xffffff80471cb4c0 : 0xffffff800141d626 mach_kernel : _panic + 0xc6
0xffffff80471cb530 : 0xffffff7f82bde579 net.lundman.spl : _vcmn_err + 0x59
0xffffff80471cb760 : 0xffffff7f82c8f4fd net.lundman.zfs : _zfs_panic_recover + 0x18d
0xffffff80471cb910 : 0xffffff7f82c01e23 net.lundman.zfs : _dmu_buf_hold_array_by_dnode + 0x173
0xffffff80471cb9d0 : 0xffffff7f82c03539 net.lundman.zfs : _dmu_write_uio_dnode + 0x79
0xffffff80471cba60 : 0xffffff7f82c0349c net.lundman.zfs : _dmu_write_uio_dbuf + 0x6c
0xffffff80471cbab0 : 0xffffff7f82cae4ce net.lundman.zfs : _zfs_write + 0xbae
0xffffff80471cbd30 : 0xffffff7f82cba368 net.lundman.zfs : _zfs_vnop_write + 0x58
0xffffff80471cbd70 : 0xffffff8001511c32 mach_kernel : _VNOP_WRITE + 0x52
0xffffff80471cbdd0 : 0xffffff8001507e99 mach_kernel : _utf8_normalizestr + 0x6e9
0xffffff80471cbe40 : 0xffffff8001776f69 mach_kernel : _write_nocancel + 0x1b9
0xffffff80471cbef0 : 0xffffff8001776e77 mach_kernel : _write_nocancel + 0xc7
0xffffff80471cbf50 : 0xffffff80017e16aa mach_kernel : _unix_syscall64 + 0x20a

Cleanest run that produces said panic appears to be:

vnop_create: '00-000-08Td.rtf'
getvnode zp 0xffffff800bb98200 with vpp 0xffffff80471cb3c0 zfsvfs 0xffffff8036f26000 vfs 0xffffff800738b3d0
Assigned zp 0xffffff800bb98200 with vp 0xffffff800baf93e0
+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
+setattr: zp 0xffffff800bb98200, vp 0xffffff800baf93e0
zfs_extend: 27812
-setattr: zp 0xffffff800bb98200 size 27812
vnop_setattr: called on vp 0xffffff800baf93e0 with mask 0010, err=0

  #define VNODE_ATTR_va_data_size         (1LL<< 4)       /* 00000010 */

getxattr vp 0xffffff800baf93e0 : ENOTSUP
+vnop_lookup '._00-000-08Td.rtf'
+setattr: zp 0xffffff800bb98200, vp 0xffffff800baf93e0
-setattr: zp 0xffffff800bb98200 size 27812
vnop_setattr: called on vp 0xffffff800baf93e0 with mask 0200, err=0

   #define VNODE_ATTR_va_mode              (1LL<< 9)       /* 00000200 */

+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
zfs_write: resid/n 21111

Which appears to be to create file, extend it to 27812, then write 21111 bytes. (size=28160 access=24136+21111). Unsure who/what seeks to 24136.

lundman commented 11 years ago

Well, who knew you can make files with both lseek()+write() holey, and using ftruncate(+offset) to "extend".

Simple test code;

#include <stdio.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
    int fd;
    char buffer[64*1024];
    int i;

    unlink("/BOOM/test");
    fd = open("/BOOM/test", O_RDWR|O_TRUNC|O_CREAT, 0600);
    if (fd < 0) exit(0);

    strcpy(buffer, "Hello world\n");

    for (i = 1; i < 10; i++) {

        ftruncate(fd, (off_t)i*(rand()%10000));

        if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
            perror("write");
    }

    close(fd);

    exit(0);
}

Will generally panic. Note the unlink() before open() appears to be required, if open truncates the file (O_TRUNC) the bug does not appear to happen. So it requires the creation of the file to trigger.

lundman commented 11 years ago

Ok, buffer growth on OSX is like 512 10240 20480 30208 ...

and because that is not a power-of-two size, the dn_datablkshift variable is set to 0, and the code containing the panic is triggered.

Compared to Linux ZOL, the buffer grows 512 10240 65536 131072

You can see only the first growth is a non-power-of-two.

This boils down to that OSX has all calls to zfs_grow_blocksize() from zfs_extend(). This blocksize grower algorithm is poor indeed.

But on Linux, only the first growth (10240) is triggered from zfs_extend(). All other come from zfs_write()'s

        if (rl->r_len == UINT64_MAX) {
            uint64_t new_blksz;

            if (zp->z_blksz > max_blksz) {
                ASSERT(!ISP2(zp->z_blksz));
                new_blksz = MIN(end_size, SPA_MAXBLOCKSIZE);
            } else {
                new_blksz = MIN(end_size, max_blksz);
            }
        }

For some reason, Linux's first call will have rl->r_len == UINT64_MAX but on OSX this never triggers, as rl->r_len == 65536

Now, once the OSX problem has been tracked down, I would guess that with a carefully crafted recordsize (avoiding power of 2) and using extend, all ZFS versions can be made to panic, due to zfs_extend's poor blocksize growth logic.

lundman commented 11 years ago

In the end, zfs_rlock_write will signal that a blocksize growth is required by setting r_len to UINT64_MAX, this was missing on OSX due to ZOL/ZSB usage. Once ported over, the problem goes away.

It is interesting to note that zfs_extend growth, which is not in power of 2, is undesirable and can possibly still trigger the soft panic. Perhaps zfs_extend blocksize growing should be encouraged to be in power of 2.

fsx and fstorture now work without panic. (although, mmap is still at times, incorrect)

https://github.com/zfs-osx/zfs/commit/312697f32576697081bd3c6e1b5ae6784883e487