openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.6k stars 1.75k forks source link

ZFS crash while send/recv #670

Closed GregorKopka closed 12 years ago

GregorKopka commented 12 years ago

this came up while zfs send/recv to an USB disk, zpool list still works, one task is hanging and unkillable 16738 pts/0 D 0:00 zfs recv -F -u -v usb-backup/system/etc/dhcp

Tried to export the backup pool, leads to other task being blocked (2nd is unkillable): 13922 ? D< 0:00 [txg_quiesce] 17215 pts/1 D 0:00 /bin/umount -t zfs /usb-backup/system/var/lib

zfs version is git this saturday. Linux version 3.0.6-gentoo (root@backend) (gcc version 4.5.3 (Gentoo 4.5.3-r1 p1.0, pie-0.4.5) ) #3 SMP Sat Apr 14 18:02:30 CEST 2012

Is there a bug or is the hardware failing?

Regards,

Gregor

------------[ cut here ]------------ WARNING: at fs/inode.c:902 unlock_new_inode+0x2c/0x46() Hardware name: GA-880GMA-USB3 Modules linked in: aufs xt_tcpudp xt_state zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables cpufreq_userspace powernow_k8 freq_table mperf ipv6 pppoe pppox ppp_generic slhc sha256_generic aes_x86_64 aes_generic cbc i2c_piix4 floppy r8169 i2c_core mii snd_hda_codec_hdmi processor snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd snd_page_alloc pcspkr button thermal_sys libiscsi scsi_transport_iscsi tg3 libphy e1000 fuse nfs lockd sunrpc jfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia firmware_class pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_platform pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix libata Pid: 16732, comm: zfs Tainted: P 3.0.6-gentoo #3 Call Trace: [] warn_slowpath_common+0x80/0x98 [] warn_slowpath_null+0x15/0x17 [] unlock_new_inode+0x2c/0x46 [] zfs_inode_update+0x636/0x653 [zfs] [] ? mutex_unlock+0x9/0xb [] zfs_mknode+0xa6d/0xb2d [zfs] [] zfs_mkdir+0x383/0x48b [zfs] [] zpl_vap_init+0x1d8/0x543 [zfs] [] vfs_mkdir+0x5d/0xb5 [] sys_mkdirat+0x91/0xe2 [] ? sys_futex+0x12a/0x139 [] sys_mkdir+0x13/0x15 [] system_call_fastpath+0x16/0x1b ---[ end trace 1ccd3a76adf3a28a ]--- general protection fault: 0000 [#1] SMP CPU 0 Modules linked in: aufs xt_tcpudp xt_state zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables cpufreq_userspace powernow_k8 freq_table mperf ipv6 pppoe pppox ppp_generic slhc sha256_generic aes_x86_64 aes_generic cbc i2c_piix4 floppy r8169 i2c_core mii snd_hda_codec_hdmi processor snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd snd_page_alloc pcspkr button thermal_sys libiscsi scsi_transport_iscsi tg3 libphy e1000 fuse nfs lockd sunrpc jfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia firmware_class pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_platform pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix libata

Pid: 16732, comm: zfs Tainted: P W 3.0.6-gentoo #3 Gigabyte Technology Co., Ltd. GA-880GMA-USB3/GA-880GMA-USB3 RIP: 0010:[] [] zfs_inode_destroy+0x86/0xf1 [zfs] RSP: 0018:ffff880127c63958 EFLAGS: 00010286 RAX: ffff880337abc588 RBX: ffff880337abc5b0 RCX: dead000000100100 RDX: dead000000200200 RSI: ffff880127c63988 RDI: ffff8802dc91a3a8 RBP: ffff880127c63978 R08: ffff880127c638f8 R09: ffff880265090430 R10: dead000000200200 R11: dead000000100100 R12: ffff8802dc91a000 R13: ffff880337abc3c0 R14: ffff8802dc91a3a8 R15: ffff880265090430 FS: 00007fe9155fbb40(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe914506670 CR3: 00000002ed345000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process zfs (pid: 16732, threadinfo ffff880127c62000, task ffff8803f3a6f320) Stack: ffff880337abc5b0 ffffffffa079cba0 ffffffffa079cba0 ffff8800ac866db0 ffff880127c63988 ffffffffa0790c72 ffff880127c639a8 ffffffff810c0027 ffff880337abc5b0 ffff880337abc5b0 ffff880127c639c8 ffffffff810c0154 Call Trace: [] zpl_vap_init+0x4d5/0x543 [zfs] [] destroy_inode+0x39/0x53 [] evict+0x113/0x118 [] iput+0x13e/0x146 [] zfs_inode_update+0x63e/0x653 [zfs] [] ? mutex_unlock+0x9/0xb [] zfs_mknode+0xa6d/0xb2d [zfs] [] zfs_mkdir+0x383/0x48b [zfs] [] zpl_vap_init+0x1d8/0x543 [zfs] [] vfs_mkdir+0x5d/0xb5 [] sys_mkdirat+0x91/0xe2 [] ? sys_futex+0x12a/0x139 [] sys_mkdir+0x13/0x15 [] system_call_fastpath+0x16/0x1b Code: 00 00 4c 89 e8 49 03 84 24 88 03 00 00 49 bb 00 01 10 00 00 00 ad de 49 ba 00 02 20 00 00 00 ad de 4c 89 f7 48 8b 08 48 8b 50 08 89 51 08 48 89 0a 4c 89 18 4c 89 50 08 49 ff 8c 24 a0 03 00 RIP [] zfs_inode_destroy+0x86/0xf1 [zfs] RSP ---[ end trace 1ccd3a76adf3a28b ]---

ryao commented 12 years ago

USB to SATA controllers usually have poor reliability in my experience. Are you seeing any messages about writes to the USB device failing in dmesg?

GregorKopka commented 12 years ago

I havn't seen any messages related to USB (apart from connecting the device).

Since the system i cloned for the setup (identical hardware specification, but still zfs-fuse based and not converted to zfsonlinux kernel mode, differnt pool but same structure) experienced some interesting segfaults and other problems today (like SSH failing with claims of undefined symbols somewhere in nss_ldap), which all could be fixed by a reboot, i think the problem might not be that related to zfs but rooted somewhere else in the system (or kernel).

So i'll go but hunting in other places first and close this issue.

ryao commented 12 years ago

@GregorKopka Were you sending a snapshot between ZFSOnLinux and ZFS-FUSE?

GregorKopka commented 12 years ago

Now that you mention it: i upgraded the pool on the system i moved to zfsonlinux.

The systems receive snapshot streams from the other systems (they are rsynced in as gzipped files and then recv'ed into the pool since the uplinks are unstable at times) every night.

I should really stop working at weekends... m(

According to the logfiles the streams got imported without problems... Do you think that getting v28 streams might mess with zfs-fuse?

ryao commented 12 years ago

If I recall, ZFS-FUSE is at pool version 23. Pool version 24 introduced "System attributes", which would probably confuse pool version 23.

GregorKopka commented 12 years ago

Still dosn't explain why on the fuse machine the system itself went kaputt - fuse was still running nicely, other stuff (rootfs not stored on pool) like ssh broke. Well, i'll compile a new image for the systems and then see how it goes.

Thanks for the help, and thanks for your support of zfs on linux!