linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

sigfuz causes a TM Bad Thing #210

Open leitao opened 5 years ago

leitao commented 5 years ago

the Sigfuz testcase[1] is causing the following TM Bad Thing:

cpu 0x3: Vector: 700 (Program Check) at [c00000003ff8bd70]
    pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
    lr: c00000000001d620: copy_vsx_from_user+0x40/0xb0
    sp: c000000426d8fb00
   msr: 8000000302a03031
  current = 0xc000000425ed8c00
  paca    = 0xc00000003ffcd480   irqmask: 0x03   irq_happened: 0x01
    pid   = 23414, comm = sigfuz
Linux version 4.20.0-rc4-00011-g3c78563f7850 (breno@unstable) (gcc version 7.3.0 (Debian 7.3.0-13)) #896 SMP Wed Dec 5 08:09:45 EST 2018
WARNING: exception is not recoverable, can't continue
enter ? for help
[c000000426d8fc00] c00000000001d620 copy_vsx_from_user+0x40/0xb0
[c000000426d8fd40] c0000000000324e8 sys_rt_sigreturn+0x228/0x880
[c000000426d8fe20] c00000000000bde4 system_call+0x5c/0x70
--- Exception: 10a5ba32332e5834  at 0000000000000000
SP (7fffd68f03b0) is in userspace
3:mon> e
cpu 0x3: Vector: 700 (Program Check) at [c00000003ff8bd70]
    pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
    lr: c00000000001d620: copy_vsx_from_user+0x40/0xb0
    sp: c000000426d8fb00
   msr: 8000000302a03031
  current = 0xc000000425ed8c00
  paca    = 0xc00000003ffcd480   irqmask: 0x03   irq_happened: 0x01
    pid   = 23414, comm = sigfuz
Linux version 4.20.0-rc4-00011-g3c78563f7850 (breno@unstable) (gcc version 7.3.0 (Debian 7.3.0-13)) #896 SMP Wed Dec 5 08:09:45 EST 2018
3:mon> bt
   type            address
leitao commented 5 years ago

I enabled a function tracer to get what the system was doing before this crash. This is the CPU trace buffer output.

https://paste.debian.net/1059209/

mpe commented 4 years ago

This is not reproducible in mainline, but we're not sure what the fix was.

mpe commented 4 years ago

Famous last words, possibly not the same root cause, but sigfuz can still trigger TM bad things:

[T30311] Unexpected TM Bad Thing exception at c00000000000e670 (msr 0x8000000300201031) tm_scratch=8000000100009033
[T30311] Oops: Unrecoverable exception, sig: 6 [#1]
[T30311] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[T30311] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_dummy snd_seq_midi_event snd_seq snd_pcm snd_timer snd_seq_device snd soundcore aes_generic tea seqiv authencesn ccm wp512 twofish_generic michael_mic echainiv ecb essiv sha1_generic khazad pcbc authenc blowfish_generic deflate drbg cmac tgr192 md4 des_generic ctr salsa20_generic twofish_common cbc jitterentropy_rng anubis serpent_generic sha512_generic cast6_generic cast_common blowfish_common ghash_generic md5 md5_ppc sha1_powerpc kvm oprofile af_key bridge act_bpf cls_bpf ah4 esp4 xfrm4_tunnel nf_log_ipv4 ipt_REJECT iptable_mangle iptable_filter nf_reject_ipv4 nf_log_arp iptable_nat ip_tables ipcomp xt_state nfnetlink_log xt_conntrack nf_nat_irc nf_nat_ftp xt_tcpudp xt_mark xt_TCPMSS nf_conntrack_irc xt_nat nf_conntrack_netlink nfnetlink xt_policy xt_NFLOG nf_nat_sip nf_conntrack_ftp xt_MASQUERADE nf_nat xt_addrtype nf_conntrack_sip nf_conntrack nf_defrag_ipv4 xt_LOG rpcrdma psnap p8022 stp llc
[T30311]  ip6t_ipv6header nf_defrag_ipv6 ip6table_filter ip6table_mangle nf_log_ipv6 nf_log_common ip6t_REJECT nf_reject_ipv6 ip6_tables x_tables xfrm_user xfrm_ipcomp xfrm_algo vfat hfs binfmt_misc jfs cifs gcm reiserfs btrfs crc32c_vpmsum autofs4 fuse nilfs2 nfsd udf overlay cramfs squashfs hfsplus vhost_net vhost jsm hvcs hvcserver appledisplay usbmon usb_storage powernv_op_panel powernv_rng pseries_rng rng_core i2c_matroxfb matroxfb_maven rpadlpar_io rpaphp mlx4_ib iw_cxgb4 ib_mthca iw_cxgb3 ib_iser ib_srp ib_ipoib rdma_ucm ib_umad ib_uverbs rdma_cm iw_cm ib_cm ib_core vmx_crypto gf128mul virtio_crypto crypto_engine nbd broadcom bcm_phy_lib ppp_async bsd_comp ppp_synctty ppp_deflate pppoe pppox ppp_generic dummy cxgb myri10ge ibmveth bnx2 bnx2x acenic i40e ixgb ixgbe s2io mlx4_en mlx4_core pcnet32 be2net netxen_nic 3c59x tun slhc bonding leds_powernv dm_zero dm_snapshot raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor multipath raid10 dm_mirror
[T30311]  dm_queue_length dm_round_robin dm_region_hash faulty dm_bufio dm_crypt dm_service_time dm_multipath dm_log pcspkr input_leds led_class evdev st sym53c8xx scsi_dh_alua scsi_dh_rdac ibmvfc qla4xxx cxgb3i cxgb3 mdio cxgb4i cxgb4 libcxgbi libcxgb be2iscsi bnx2i cnic uio qla2xxx cxlflash cxl ocxl mpt3sas scsi_transport_sas iscsi_boot_sysfs scsi_transport_spi lpfc raid_class libiscsi_tcp libiscsi scsi_transport_iscsi test_printf test_bpf crc_itu_t raid6_pq test_strscpy libdes libarc4 libaes test_firmware zstd_compress zstd_decompress test_bitmap test_static_keys crc_t10dif crct10dif_generic crct10dif_common test_user_copy test_static_key_base crc_ccitt
[T30311] CPU: 4 PID: 30311 Comm: sigfuz Not tainted 5.4.0-rc2-gcc9x-gc145ff82749b #1
[T30311] NIP:  c00000000000e670 LR: c0000000004c7954 CTR: 0000000000000000
[T30311] REGS: c00000000ff83d70 TRAP: 0700   Not tainted  (5.4.0-rc2-gcc9x-gc145ff82749b)
[T30311] MSR:  8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 44002488  XER: 00090785
[T30311] CFAR: c00000000000e5b4 IRQMASK: 0 
[T30311] PACATMSCRATCH: 8000000200001033 
[T30311] GPR00: c0000000004c7954 c0000000d6a377e0 c000000001811c00 c000000000fc2cf0 
[T30311] GPR04: 0000000000000563 0000000000000000 0000000000000408 ffffffffffffffff 
[T30311] GPR08: c000000001835a18 00007ffff7d20000 0000000000000000 c0000000c7bebc80 
[T30311] GPR12: 0000000000000000 c00000000fffb680 c0000000bd241058 0000000000000044 
[T30311] GPR16: 0000000000000008 c000000001044cf8 0000000000000050 c0000000c7bebc80 
[T30311] GPR20: 000fffffffffffff 000fffffffffffff c0000000c7bebd00 c0000000d6a37b40 
[T30311] GPR24: 00000000000011c0 0000000008360000 000000000000139c 0000000000000000 
[T30311] GPR28: c0000000efb20f08 c0000000efb20f68 c0000000c7bebc80 c0000000bd6dd268 
[T30311] NIP [c00000000000e670] fast_exception_return+0x120/0x1e4
[T30311] LR [c0000000004c7954] elf_core_dump+0x1244/0x1690
[T30311] Call Trace:
[T30311] [c0000000d6a377e0] [c0000000004c7954] elf_core_dump+0x1244/0x1690 (unreliable)
[T30311] [c0000000d6a37a00] [c0000000004d0878] do_coredump+0x988/0x1490
[T30311] [c0000000d6a37c10] [c000000000161270] get_signal+0x1f0/0xcb0
[T30311] [c0000000d6a37d30] [c000000000027660] do_notify_resume+0x140/0x410
[T30311] [c0000000d6a37e20] [c00000000000e4c4] ret_from_except_lite+0x70/0x74
[T30311] Instruction dump:
[T30311] 7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088 
[T30311] 60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989 
[T30311] ---[ end trace 814d362bfe3181f5 ]---