Open mooinglemur opened 10 years ago
I'm going to add here that I haven't seen this happen on boxes that exclusively have drives 2TB and smaller. It has happened on machines with 3TB and 4TB drives. I don't know if it's another one of these 32 bit LBA bugs, but I thought it was noteworthy.
I've seen the same behaviour:
Feb 6 20:05:57 wirt kernel: [907547.614898] ------------[ cut here ]------------
Feb 6 20:05:57 wirt kernel: [907547.615168] kernel BUG at /var/lib/dkms/enhanceio/0.1/build/eio_main.c:1518!
Feb 6 20:05:57 wirt kernel: [907547.615580] invalid opcode: 0000 [#1] SMP
Feb 6 20:05:57 wirt kernel: [907547.615834] Modules linked in: btrfs(F) zlib_deflate(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) libcrc32c(F) reiserfs(F) ext2(F) vesaf
b(F) pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) enhanceio_lru(OF) enhanceio_fifo(OF) enhanceio(OF) gpio_ich x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu
l ghash_clmulni_intel aesni_intel mxm_wmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev mei_me lpc_ich mei ioatdma wmi lp mac_hid parport raid10 raid456 async_raid6_recov asy
nc_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq raid1 raid0 igb isci multipath usbhid i2c_algo_bit ahci libsas usb_storage dca hid linear libahci ptp scsi_transport_sas pps_core
Feb 6 20:05:57 wirt kernel: [907547.653858] CPU: 0 PID: 20999 Comm: kworker/u49:0 Tainted: GF O 3.11.0-15-generic #23-Ubuntu
Feb 6 20:05:57 wirt kernel: [907547.676584] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
Feb 6 20:05:57 wirt kernel: [907547.700132] Workqueue: eio_callback eio_post_io_callback [enhanceio]
Feb 6 20:05:57 wirt kernel: [907547.710692] task: ffff8808511faee0 ti: ffff880b448ac000 task.ti: ffff880b448ac000
Feb 6 20:05:57 wirt kernel: [907547.731170] RIP: 0010:[<ffffffffa03c54bc>] [<ffffffffa03c54bc>] eio_md_write+0x24c/0x250 [enhanceio]
Feb 6 20:05:57 wirt kernel: [907547.752336] RSP: 0018:ffff880b448add90 EFLAGS: 00010206
Feb 6 20:05:57 wirt kernel: [907547.762933] RAX: 0000000000000001 RBX: 000000000001facf RCX: 0000000000000001
Feb 6 20:05:57 wirt kernel: [907547.784307] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000282
Feb 6 20:05:57 wirt kernel: [907547.806288] RBP: ffff880b448addd0 R08: ffff88085fc14580 R09: 0000000000000001
Feb 6 20:05:57 wirt kernel: [907547.829061] R10: 0000000000000000 R11: 0000000000014580 R12: ffff88033cbb8a80
Feb 6 20:05:57 wirt kernel: [907547.852220] R13: 0000000000000001 R14: 0000000000000000 R15: ffffc9002eeaca38
Feb 6 20:05:57 wirt kernel: [907547.875239] FS: 0000000000000000(0000) GS:ffff88085fc00000(0000) knlGS:0000000000000000
Feb 6 20:05:57 wirt kernel: [907547.899255] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 6 20:05:57 wirt kernel: [907547.910856] CR2: 00007f35579804e0 CR3: 0000000001c0e000 CR4: 00000000001427f0
Feb 6 20:05:57 wirt kernel: [907547.933950] Stack:
Feb 6 20:05:57 wirt kernel: [907547.944918] ffff88085190c000 0000000000000203 ffff88033cbb8700 ffff880cbbe16e70
Feb 6 20:05:57 wirt kernel: [907547.967166] ffff88085190c000 ffff880289f91960 0000000000000000 000000000001facf
Feb 6 20:05:57 wirt kernel: [907547.989756] ffff880b448ade20 ffffffffa03c5fc9 ffff880800000062 0000000001facf11
Feb 6 20:05:57 wirt kernel: [907548.012107] Call Trace:
Feb 6 20:05:57 wirt kernel: [907548.023179] [<ffffffffa03c5fc9>] eio_post_io_callback+0x739/0x950 [enhanceio]
Feb 6 20:05:57 wirt kernel: [907548.045135] [<ffffffff8107cfec>] process_one_work+0x17c/0x430
Feb 6 20:05:57 wirt kernel: [907548.056189] [<ffffffff8107dfac>] worker_thread+0x11c/0x3c0
Feb 6 20:05:57 wirt kernel: [907548.066831] [<ffffffff8107de90>] ? manage_workers.isra.25+0x2a0/0x2a0
Feb 6 20:05:57 wirt kernel: [907548.077500] [<ffffffff81084740>] kthread+0xc0/0xd0
Feb 6 20:05:57 wirt kernel: [907548.088052] [<ffffffff81084680>] ? kthread_create_on_node+0x120/0x120
Feb 6 20:05:57 wirt kernel: [907548.098605] [<ffffffff816f716c>] ret_from_fork+0x7c/0xb0
Feb 6 20:05:57 wirt kernel: [907548.108893] [<ffffffff81084680>] ? kthread_create_on_node+0x120/0x120
Feb 6 20:05:57 wirt kernel: [907548.119173] Code: 48 89 72 38 48 89 42 30 49 89 57 40 49 c7 46 38 00 00 00 00 e9 05 ff ff ff 49 89 4d 38 4d 89 6c 24 18 e9 2a fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41
Feb 6 20:05:57 wirt kernel: [907548.151012] RIP [<ffffffffa03c54bc>] eio_md_write+0x24c/0x250 [enhanceio]
Feb 6 20:05:57 wirt kernel: [907548.161026] RSP <ffff880b448add90>
Feb 6 20:06:01 wirt kernel: [907548.183458] ---[ end trace 838f3d5ef81c5266 ]---
any comments on this by the devs? Is this project even alive?
I know it is bit too late to ask for details, but I was curious if you have details around following, I will try and reproduce the same on my local setup.
Could you please share following details...
Thank you.
I have seen the same issue on the ticket I just opened. I guess ill have to look at an alternative.
I have hit the same issue as noted here. I had set up Linux host as a iSCSI target on which there were about 10 Virtual machines running on VMware vsphere. Of the 10 VMs 5 were running vdbench, dt, and sqliosim. I have 3 SSDs in raid0 doing the duty of WB cache for a raid5 set of 5 HDDs.
~# eio_cli info Cache Name : raidvol1_cache Source Device : /dev/md126 SSD Device : /dev/md127 Policy : lru Mode : Write Back Block Size : 4096 Associativity : 256 State : normal
kernel: [2573885.309343] kernel BUG at /root/enhanceio/EnhanceIO/Driver/enhanceio/eio_main.c:1518!
kernel: [2573885.309445] invalid opcode: 0000 [#1] SMP
kernel: [2573885.309505] Modules linked in: ib_srpt ib_cm ib_sa ib_mad ib_core tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod target_core_pscsi target_core_file target_c
ore_iblock target_core_mod configfs nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache enhanceio_rand(OF) enhanceio_lru(OF) enhanceio_fifo(OF) enhanceio(OF) snd_hda_codec_hdmi eeepc_wmi asus_wmi sparse_keymap kvmamd kvm crct10dif
pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_realtek nouveau serio_raw snd_hda_intel edac_core k10temp edac_mce_amd fam15h_power snd_hda_codec snd_hwdep m
xm_wmi video ttm joydev drm_kms_helper snd_pcm snd_page_alloc drm snd_timer sp5100_tco i2c_piix4 snd i2c_algo_bit soundcore 8021q garp stp mrp llc mac_hid wmi lp parport raid10 raid456 async_raid6_recov async_memcpy async_pq async
_xor async_tx hid_generic usbhid hid xor raid6_pq psmouse raid1 r8169 mii ixgbe raid0 dca mpt2sas ptp multipath ahci pps_core raid_class libahci linear scsi_transport_sas mdio
kernel: [2573885.310644] CPU: 2 PID: 976 Comm: kworker/u12:2 Tainted: GF O 3.13.0-19-generic #40-Ubuntu
kernel: [2573885.310750] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2005 09/16/2013
kernel: [2573885.310865] Workqueue: eio_callback eio_post_io_callback [enhanceio]
kernel: [2573885.310931] task: ffff8807b6465fc0 ti: ffff8807ba7ce000 task.ti: ffff8807ba7ce000
kernel: [2573885.311031] RIP: 0010:[
kernel: [2573885.313286] RIP [
We have a number of machines using enhanceio, and we've been trying to nail down some lockups that happen under load. The load has increased substantially, and now we're seeing frequent lockups. I changed the kernel to panic on oops, so that these machines will reboot themselves, so it's not a crisis. :)
I was finally able to capture a lockup that happened early in a boot. Since this oops seems to render the machine locked up and useless, panicking seems like the right thing. Let me know if I can provide more info.