Open leitao opened 5 years ago
I enabled a function tracer to get what the system was doing before this crash. This is the CPU trace buffer output.
This is not reproducible in mainline, but we're not sure what the fix was.
Famous last words, possibly not the same root cause, but sigfuz can still trigger TM bad things:
[T30311] Unexpected TM Bad Thing exception at c00000000000e670 (msr 0x8000000300201031) tm_scratch=8000000100009033
[T30311] Oops: Unrecoverable exception, sig: 6 [#1]
[T30311] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[T30311] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_dummy snd_seq_midi_event snd_seq snd_pcm snd_timer snd_seq_device snd soundcore aes_generic tea seqiv authencesn ccm wp512 twofish_generic michael_mic echainiv ecb essiv sha1_generic khazad pcbc authenc blowfish_generic deflate drbg cmac tgr192 md4 des_generic ctr salsa20_generic twofish_common cbc jitterentropy_rng anubis serpent_generic sha512_generic cast6_generic cast_common blowfish_common ghash_generic md5 md5_ppc sha1_powerpc kvm oprofile af_key bridge act_bpf cls_bpf ah4 esp4 xfrm4_tunnel nf_log_ipv4 ipt_REJECT iptable_mangle iptable_filter nf_reject_ipv4 nf_log_arp iptable_nat ip_tables ipcomp xt_state nfnetlink_log xt_conntrack nf_nat_irc nf_nat_ftp xt_tcpudp xt_mark xt_TCPMSS nf_conntrack_irc xt_nat nf_conntrack_netlink nfnetlink xt_policy xt_NFLOG nf_nat_sip nf_conntrack_ftp xt_MASQUERADE nf_nat xt_addrtype nf_conntrack_sip nf_conntrack nf_defrag_ipv4 xt_LOG rpcrdma psnap p8022 stp llc
[T30311] ip6t_ipv6header nf_defrag_ipv6 ip6table_filter ip6table_mangle nf_log_ipv6 nf_log_common ip6t_REJECT nf_reject_ipv6 ip6_tables x_tables xfrm_user xfrm_ipcomp xfrm_algo vfat hfs binfmt_misc jfs cifs gcm reiserfs btrfs crc32c_vpmsum autofs4 fuse nilfs2 nfsd udf overlay cramfs squashfs hfsplus vhost_net vhost jsm hvcs hvcserver appledisplay usbmon usb_storage powernv_op_panel powernv_rng pseries_rng rng_core i2c_matroxfb matroxfb_maven rpadlpar_io rpaphp mlx4_ib iw_cxgb4 ib_mthca iw_cxgb3 ib_iser ib_srp ib_ipoib rdma_ucm ib_umad ib_uverbs rdma_cm iw_cm ib_cm ib_core vmx_crypto gf128mul virtio_crypto crypto_engine nbd broadcom bcm_phy_lib ppp_async bsd_comp ppp_synctty ppp_deflate pppoe pppox ppp_generic dummy cxgb myri10ge ibmveth bnx2 bnx2x acenic i40e ixgb ixgbe s2io mlx4_en mlx4_core pcnet32 be2net netxen_nic 3c59x tun slhc bonding leds_powernv dm_zero dm_snapshot raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor multipath raid10 dm_mirror
[T30311] dm_queue_length dm_round_robin dm_region_hash faulty dm_bufio dm_crypt dm_service_time dm_multipath dm_log pcspkr input_leds led_class evdev st sym53c8xx scsi_dh_alua scsi_dh_rdac ibmvfc qla4xxx cxgb3i cxgb3 mdio cxgb4i cxgb4 libcxgbi libcxgb be2iscsi bnx2i cnic uio qla2xxx cxlflash cxl ocxl mpt3sas scsi_transport_sas iscsi_boot_sysfs scsi_transport_spi lpfc raid_class libiscsi_tcp libiscsi scsi_transport_iscsi test_printf test_bpf crc_itu_t raid6_pq test_strscpy libdes libarc4 libaes test_firmware zstd_compress zstd_decompress test_bitmap test_static_keys crc_t10dif crct10dif_generic crct10dif_common test_user_copy test_static_key_base crc_ccitt
[T30311] CPU: 4 PID: 30311 Comm: sigfuz Not tainted 5.4.0-rc2-gcc9x-gc145ff82749b #1
[T30311] NIP: c00000000000e670 LR: c0000000004c7954 CTR: 0000000000000000
[T30311] REGS: c00000000ff83d70 TRAP: 0700 Not tainted (5.4.0-rc2-gcc9x-gc145ff82749b)
[T30311] MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 44002488 XER: 00090785
[T30311] CFAR: c00000000000e5b4 IRQMASK: 0
[T30311] PACATMSCRATCH: 8000000200001033
[T30311] GPR00: c0000000004c7954 c0000000d6a377e0 c000000001811c00 c000000000fc2cf0
[T30311] GPR04: 0000000000000563 0000000000000000 0000000000000408 ffffffffffffffff
[T30311] GPR08: c000000001835a18 00007ffff7d20000 0000000000000000 c0000000c7bebc80
[T30311] GPR12: 0000000000000000 c00000000fffb680 c0000000bd241058 0000000000000044
[T30311] GPR16: 0000000000000008 c000000001044cf8 0000000000000050 c0000000c7bebc80
[T30311] GPR20: 000fffffffffffff 000fffffffffffff c0000000c7bebd00 c0000000d6a37b40
[T30311] GPR24: 00000000000011c0 0000000008360000 000000000000139c 0000000000000000
[T30311] GPR28: c0000000efb20f08 c0000000efb20f68 c0000000c7bebc80 c0000000bd6dd268
[T30311] NIP [c00000000000e670] fast_exception_return+0x120/0x1e4
[T30311] LR [c0000000004c7954] elf_core_dump+0x1244/0x1690
[T30311] Call Trace:
[T30311] [c0000000d6a377e0] [c0000000004c7954] elf_core_dump+0x1244/0x1690 (unreliable)
[T30311] [c0000000d6a37a00] [c0000000004d0878] do_coredump+0x988/0x1490
[T30311] [c0000000d6a37c10] [c000000000161270] get_signal+0x1f0/0xcb0
[T30311] [c0000000d6a37d30] [c000000000027660] do_notify_resume+0x140/0x410
[T30311] [c0000000d6a37e20] [c00000000000e4c4] ret_from_except_lite+0x70/0x74
[T30311] Instruction dump:
[T30311] 7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
[T30311] 60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
[T30311] ---[ end trace 814d362bfe3181f5 ]---
the Sigfuz testcase[1] is causing the following TM Bad Thing: