sonic-net / SONiC

Landing page for Software for Open Networking in the Cloud (SONiC) - https://sonic-net.github.io/SONiC/
2.23k stars 1.12k forks source link

Device Crash Due to Illegal Address Access When Switching Modes in optoe Driver #1729

Open Alicia-Zhu opened 3 months ago

Alicia-Zhu commented 3 months ago

root@sonic:/sys/bus/i2c/devices/i2c-32/32-0050# cat dev_class 2 root@sonic:/sys/bus/i2c/devices/i2c-32/32-0050# echo 3 > dev_class [73752.477250] stack segment: 0000 [#1] PREEMPT SMP PTI [73752.482806] CPU: 2 PID: 228565 Comm: bash Tainted: G OE 6.1.0-11-2-amd64 #1 Debian 6.1.38-4 [73752.493697] Hardware name: Default string Default string/Default string, BIOS 5.11(3BARB029) 01/14/2022 [73752.504197] RIP: 0010:osq_lock+0x5d/0x130 [73752.508679] Code: 00 00 00 48 c7 03 00 00 00 00 89 43 14 41 87 04 24 85 c0 0f 84 b9 00 00 00 83 e8 01 48 98 48 03 2c c5 c0 fa bc 85 48 89 6b 08 <48> 89 5d 00 44 8b 6b 10 45 85 ed 0f 85 97 00 00 00 65 48 8b 14 25 [73752.529671] RSP: 0018:ffffb9fec8f07c98 EFLAGS: 00010206 [73752.535511] RAX: ffffffffffff94fa RBX: ffff94fc77cb1b40 RCX: 0000000000000003 [73752.543486] RDX: 00000000dead0000 RSI: ffffffff85b446c6 RDI: ffffffff85b1dc0d [73752.551460] RBP: 735f617461677ab2 R08: 0000000000000001 R09: 000000000000000a [73752.559434] R10: 000000000000000a R11: 0fffffffffffffff R12: ffff94fb08aa28ac [73752.567408] R13: 0000000000000002 R14: ffff94fb08aa28ac R15: ffff94fb0262c0e0 [73752.575380] FS: 00007f625e156740(0000) GS:ffff94fc77c80000(0000) knlGS:0000000000000000 [73752.584424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [73752.590847] CR2: 00007ffd5a3d10f0 CR3: 00000001061fc001 CR4: 00000000003706e0 [73752.598824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [73752.606800] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [73752.614775] Call Trace: [73752.617505] [73752.619846] ? __die_body.cold+0x1a/0x1f [73752.624222] ? die+0x2a/0x50 [73752.627440] ? do_trap+0xc5/0x110 [73752.631143] ? do_error_trap+0x6a/0x90 [73752.635333] ? exc_stack_segment+0x33/0x50 [73752.639910] ? asm_exc_stack_segment+0x22/0x30 [73752.644878] ? osq_lock+0x5d/0x130 [73752.648678] ? osq_lock+0x29/0x130 [73752.652477] __mutex_lock.constprop.0+0x1cd/0x700 [73752.657738] ? aa_file_perm+0x11f/0x4e0 [73752.662026] device_del+0x37/0x410 [73752.665827] device_unregister+0x13/0x60 [73752.670211] set_dev_class+0x7f/0x130 [optoe] [73752.675083] kernfs_fop_write_iter+0x11e/0x1f0 [73752.680049] vfs_write+0x244/0x400 [73752.683853] ksys_write+0x6b/0xf0 [73752.687558] do_syscall_64+0x5b/0xc0 [73752.691559] ? syscall_exit_to_user_mode+0x17/0x40 [73752.696918] ? do_syscall_64+0x67/0xc0 [73752.701112] ? syscall_exit_to_user_mode+0x17/0x40 [73752.706472] ? do_syscall_64+0x67/0xc0 [73752.710666] ? do_syscall_64+0x67/0xc0 [73752.714861] ? fpregs_assert_state_consistent+0x22/0x50 [73752.720710] ? exit_to_user_mode_prepare+0x40/0x1d0 [73752.726171] entry_SYSCALL_64_after_hwframe+0x69/0xd3 [73752.731827] RIP: 0033:0x7f625e251240 [73752.735828] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 [73752.756827] RSP: 002b:00007ffd5a3d1b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [73752.765296] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f625e251240 [73752.773274] RDX: 0000000000000002 RSI: 000055d220062ed0 RDI: 0000000000000001 [73752.781252] RBP: 000055d220062ed0 R08: 0000000000000007 R09: 0000000000000073 [73752.789232] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 [73752.797208] R13: 00007f625e32c760 R14: 0000000000000002 R15: 00007f625e3279e0 [73752.805187] [73752.807624] Modules linked in: xt_TCPMSS(E) team_mode_loadbalance(E) team(E) xt_hl(E) xt_tcpudp(E) ip6_tables(E) xt_conntrack(E) ebt_vlan(E) nft_compat(E) nf_tables(E) plat_sff(OE) plat_psu(OE) plat_fan(OE) plat_switch(OE) plat_dfd(OE) wb_tps53622(OE) wb_ina3221(OE) wb_csu550(OE) wb_isl68137(OE) wb_pmbus_core(OE) wb_mac_bsc(OE) at24(E) optoe(E) wb_lm75(OE) regmap_i2c(E) wb_i2c_dev_device(OE) wb_i2c_mux_pca954x_device(OE) wb_i2c_mux_pca954x(OE) wb_i2c_mux_pca9641(OE) wb_i2c_ocores_device(OE) wb_i2c_ocores(OE) wb_i2c_dev(OE) wb_pcie_dev_device(OE) wb_pcie_dev(OE) wb_fpga_pcie(OE) wb_io_dev_device(OE) wb_io_dev(OE) wb_lpc_drv_device(OE) wb_lpc_drv(OE) platform_common(OE) wb_i2c_gpio_device(OE) wb_gpio_device(OE) i2c_mux(E) wb_i2c_gpio(OE) wb_i2c_algo_bit(OE) i2c_dev(E) wb_gpio_d1500(OE) wb_i2c_i801(OE) 8021q(E) garp(E) mrp(E) bridge(E) stp(E) llc(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) linux_ngbde(OE) linux_knet_cb(OE) [73752.807701] linux_bcm_knet(OE) psample(E) linux_user_bde(OE) linux_kernel_bde(OE) nvme_fabrics(E) cfg80211(E) rfkill(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) bonding(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) tls(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) ipmi_ssif(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) intel_uncore(E) evdev(E) iTCO_wdt(E) pcspkr(E) intel_pmc_bxt(E) mxm_wmi(E) iTCO_vendor_support(E) watchdog(E) mei_me(E) acpi_ipmi(E) mei(E) ipmi_si(E) ipmi_devintf(E) ioatdma(E) ipmi_msghandler(E) acpi_pad(E) button(E) cdc_subset(E) sg(E) nfnetlink(E) fuse(E) efi_pstore(E) drm(E) dm_mod(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) zstd(E) zstd_compress(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) [73752.904834] overlay(E) squashfs(E) cdc_eem(E) usbnet(E) mii(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) ixgbe(E) libahci(E) xfrm_algo(E) mdio_devres(E) crct10dif_pclmul(E) xhci_pci(E) ehci_pci(E) of_mdio(E) gpio_ich(E) crct10dif_common(E) fixed_phy(E) libata(E) crc32_pclmul(E) xhci_hcd(E) ehci_hcd(E) fwnode_mdio(E) crc32c_intel(E) igb(E) i2c_algo_bit(E) scsi_mod(E) libphy(E) dca(E) lpc_ich(E) i2c_smbus(E) scsi_common(E) usbcore(E) ptp(E) usb_common(E) pps_core(E) mdio(E) wmi(E) [last unloaded: i2c_i801(E)] [73753.057239] ---[ end trace 0000000000000000 ]--- [73753.069236] ttyS ttyS1: 1 input overrun(s) [73753.217275] RIP: 0010:osq_lock+0x5d/0x130 [73753.221762] Code: 00 00 00 48 c7 03 00 00 00 00 89 43 14 41 87 04 24 85 c0 0f 84 b9 00 00 00 83 e8 01 48 98 48 03 2c c5 c0 fa bc 85 48 89 6b 08 <48> 89 5d 00 44 8b 6b 10 45 85 ed 0f 85 97 00 00 00 65 48 8b 14 25 [73753.242761] RSP: 0018:ffffb9fec8f07c98 EFLAGS: 00010206 [73753.248607] RAX: ffffffffffff94fa RBX: ffff94fc77cb1b40 RCX: 0000000000000003 [73753.256590] RDX: 00000000dead0000 RSI: ffffffff85b446c6 RDI: ffffffff85b1dc0d [73753.264574] RBP: 735f617461677ab2 R08: 0000000000000001 R09: 000000000000000a [73753.272553] R10: 000000000000000a R11: 0fffffffffffffff R12: ffff94fb08aa28ac [73753.280535] R13: 0000000000000002 R14: ffff94fb08aa28ac R15: ffff94fb0262c0e0 [73753.288517] FS: 00007f625e156740(0000) GS:ffff94fc77c80000(0000) knlGS:0000000000000000 [73753.297569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [73753.303997] CR2: 00007ffd5a3d10f0 CR3: 00000001061fc001 CR4: 00000000003706e0 [73753.311981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [73753.319963] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [73753.327946] Kernel panic - not syncing: Fatal exception [73753.333789] Kernel Offset: 0x3800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [73753.539079] Rebooting in 10 seconds..

Alicia-Zhu commented 2 months ago

Please track the progress of this issue in this issue of the same content in sonic-buildimage