intel / cc-oci-runtime

OCI (Open Containers Initiative) compatible runtime for Intel® Architecture
GNU General Public License v2.0
416 stars 59 forks source link

CI tests: kernel panic in the DUT randomly #581

Closed chavafg closed 7 years ago

chavafg commented 7 years ago

Recently I have seen that sometimes the machines that run the CI tests are getting hanged. I have experienced the same issue on ClearLinux and Ubuntu machines. I have not seen this behavior in the Fedora machine.

When trying to connect the monitor to the hanged machine, I cannot see any output. On the test execution log I can see that the last tests that were being executed:

✓ HostName is passed to the container ✓ Verify connectivity between 2 containers ✓ Port a container ✓ Restart a container ✓ Run shell echo ✓ stdout using run ✓ stderr using run ✓ stdin from pipe ✓ check that the replicas' names are different

The last test belong to the swarm test suite.

I am trying to reproduce the error and will document further as I get more information.

GabyCT commented 7 years ago

Which docker version are you using?

GabyCT commented 7 years ago

If you are using 1.12.5 or above you will face issues with docker swarm leave --force that is not finishing and that is causing the hanging, https://github.com/01org/cc-oci-runtime/pull/579 should solve that

chavafg commented 7 years ago

Still on version 1.12.1. 579 solves swarm hangs, right? But I have seen that the whole machine is getting hanged.

GabyCT commented 7 years ago

Mmm..that is odd with 1.12.1

jcvenegas commented 7 years ago

This is all the machine not just docker, right ? Do you know if the happen with runc as well ? or just Clear Containers ?

If the machine freezes, what is the kernel version in Ubuntu and Clear Linux ?

chavafg commented 7 years ago

@jcvenegas that is right, haven't tested with runc, but after trying to reproduce, now the machine got hanged in the proxy tests, so maybe it is more related with clear containers:

============================================================================
make[2]: Leaving directory '/home/cc-tester/go/src/github.com/01org/cc-oci-runtime'
make[1]: Leaving directory '/home/cc-tester/go/src/github.com/01org/cc-oci-runtime'
go test -v -race -timeout 2s ./proxy
packet_write_wait: Connection to ... port 22: Broken pipe

On Ubuntu 16.04 I have this kernel: 4.4.0-59-generic On clear 12800 I have this one: 4.9.4-294.native

chavafg commented 7 years ago

This is the kernel panic I found on the ubuntu machine:

[ 1123.534364] kvm [21530]: vcpu0 unhandled rdmsr: Ox1c9
[ 1123.542107] kvm [21530]: vcpuO unhandled rdmsr: 0x606
[ 1123.542308] kvm [21530]: vcpu0 unhandled rdmsr: 0x34
[ 1124.374266] kvm [21777]: vcpu0 unhandled rdmsr: Ox1c9
[ 1124.381416] kvm [21777]: vcpuO unhandled rdmsr: 0x606
[ 1124.381628] kvm [21777]: vcpu0 unhandled rdmsr: 0x34
[ 1125.383469] kvm [22024]: vcpu0 unhandled rdmsr: Ox1c9
[ 1128.387727] kvm [23005]: vcpu0 unhandled rdnsr: Ox1c9
[ 1128.395499] kvm [23005]: vcpu0 unhandled rdmsr: 0x606
[ 1128.395702] kvm [23005]: vcpuO unhandled rdmsr: Ox34
[ 1129.407891] kvm [23289]: vcpu0 unhandled rdnsr: Ox1c9
[ 1129.415320] kvm [23289]: vcpu0 unhandled rdmsr: 0x606
[ 1129.415533] kvm [23289]: vcpu0 unhandled rdmsr: 0x34
[ 1252.533514] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 1252.533722] IP: [<ffffffffc09f49ca>] br_multicast_group_expired+0x2a/0xb0 [bridge]
[ 1252.533938] PGD 0
[ 1252.534148] Oops: 0000 [#1] SMP
[ 1252.534362] Modules linked in: vhost_net vhost macvtap macvlan ip_vs_rr xt_ipvs xt_REDIRECT nf_nat_redirect ip_vs vlxan ip6_udp_tunnel udp_tunnel ip_table_mangle xt_mark xt_nat xt_tcpudp veth
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat
nf_conntrack br_netfilter bridge stp llc aufs nls_iso8859_1 arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvw kvm mac80211 irqbypass ir_lirc_codec
ir_xmp_decoder lirc_dev ir_mce_kdb_decoder ir_sharp_decoder ir_sanyo_decoder ir_sony_decoder ir_jvc_decoder snd_hda_codec_realtek ir_rc6_decoder snd_hda_codec_hdmi iw1wifi
snd_hda_codec_generic serio_raw ir_nec_Decoder ir_rc5_decoder btusb btrtl btbcm btintel bluetooth joydev lpc_ich cfg80211 input_leds snd_hda_intel rc_rc6_mce snd_hda_codec
nuvoton_cir snd_roc_rt5640 snd_hda_core nei_me snd_hwdep shdchp mei snd_soc_ssm4567 snd_soc_r16231 rc_core snd_soc_core snd_Compress ac97_bus elan_i2c smd_pcm_dmaengine snd_pcm
snd_timer acpi_mad mac_hid spi_pxa2xx_platform snd dw_dmac soundcore dw_dmac_core i2c_designware_platform 8250_dw snd_soc_sst_a_xor async_tx xxor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd i2c_algo_bit drm_kms_helper psmouse ahci syscopyarea 
libahci sysfillrect sysimgblt e1000e fb_sys_fops ptp drm pps_core sdhci_acpi sdhci video fjes i2c_hid hid
[ 1252.537896] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-59-generic #80-Ubuntu
[ 1252.536439] Hardware name:              /M11C5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
[ 1252.539198] task: ffff1380444761c00 ti: ffff880444766030 task.ti: Ifff88044476c606
[ 1252.539573] RIP: 0010:[<fffffffc09f49ca>] [<ffffffffc09f49ca>] br_multicast_group_expired+0x2a/0xb0 [bridge]
[ 1252.540171] RSP:0018:ffff880456d03e60 EFLAGS: 00010246
[ 1252.540769] RAX: 0000000000000000 RBX: ffff88130352b72b0 RCX: ffff880456d03eee
[ 1252.541379] RDX: 0000000000000001 RSI: fffffffffc09f49a0 RDI: ffff8800352b72b0
[ 1252.541998] RBP: ffff880456d03e78 R08: 00000000000000022 R09: 0000000000000002
[ 1252.542622] R10: 0000000000000020 R11: 00000000000000004 R12: ffff8800352b6840
[ 1252.543251] R13: ffff88043df62840 R14: fffffffffc09f49a0 R15: ffff88043df62840
[ 1252.543806] FS: 00000000000000000(0000) GS:ffff880456d00000(0000) kn1GS:0000000000000000
[ 1252.544537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000800500330010 DS: 0000 ES: 0000 CR0: 000000080050033
[ 1252.545191] CR2: 0000000000000048 CR3: 00000000002e0a000 CR4: 00000000003406e0
[ 1252.545861] Stack:
[ 1252.546525] ffff880456d0ddc0 ffff88043df62880 0000000000000100 ffff880456d03eb0
[ 1252.547216] ffffffff810ecda5 ffff880456d0ddc0 ffff88043df62880 ffffffffc09f49a0
[ 1252.547914] 0000000000000001 0000000000000000 ffff880456d03f18 00000000810cd75a
[ 1252.548620] Call Trace:
[ 1252.549322] <IRQ>
[ 1252.549331] [<ffffffff810ecda5>] call_timer_fn+0x35/0x120
[ 1252.550752] [<ffffffffc09f49a0>] ? br_mdb_free+0x30/0x30[bridge]
[ 1252.551477] [<ffffffff810ed75a>] run_timer_softirq+0x23a/0xef0
[ 1252.552206] [<ffffffff81085db1>] __do_softirq+0x/101/0x290
[ 1252.552939] [<ffffffff810860b3>] irq_exit+0xa3/0xb0
[ 1252.553680] [<ffffffff8183afa2>] smp_apic_timer_interrupt+0x42/0x50
[ 1252.554426] [<ffffffff81839262>] apic_timer_interrupt+00x82/0x90
[ 1252.555174] <EOI>
[ 1252.555181] [<ffffffff816cae31>] ? cpuidle_enter_state+0x111/0x2b0
[ 1252.556692] [<ffffffff816cb007>] cpuidle_enter+0x17/0x20
[ 1252.557461] [<ffffffff810c4582>] call_cpuidle+0x32/0x60
[ 1252.558238] [<ffffffff816cafe3>] ? cpuidle_select+0x13/0x20
[ 1252.559018] [<ffffffff810c4840>] cpu_startup_entry+x290/0x350
[ 1252.559801] [<ffffffff81051784>] start_secondary+0x154/0x190
[ 1252.559801] Code: 00 0f if 44 00 00 55 48 89 e5 41 55 41 54 53 4c 8b 67 20 49 89 fd 49 8d 9c 24 70 0a 00 00 48 89 df e8 7b 38 e4 c0 49 8b 44 24 18 <48> 8b 40 48 a8 01 74 07 49 83 7d 48 00 74 11 48 89 df c6 07 00
[ 1252.561514] RIP [<ffffffffc09f49ca>] br_multicast_group_expired+0x2a/0xb0 [bridge]
[ 1252.562376]  RSP <ffff880456d03e60>
[ 1252.563238] CR2: 0000000000000048
[ 1252.564104] ---[ end trace 2f5141f12b306f92 ]---
[ 1252.564975] Kernel panic - not syncing: Fatal exception in interrupt
[ 1252.565863] Kernel Offset: disabled
[ 1252.566747] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Will try to check if it is a similar issue on clear

chavafg commented 7 years ago

This should be easily reproducible by running the ./tests/helpers/test-campaign.sh in a loop. e.g.

for i in {1..100}; do 
    ./tests/helpers/test-campaign.sh
    sudo systemctl restart docker 
done
dlespiau commented 7 years ago

A oops in the bridging code handling multicast to implement igmp snooping. IGMP snooping is a feature that looks at join/leave IGMP messages to forward multicast packets to the correct port(s) instead of defaulting to broadcasting.

As a workaround You may want to try disabling CONFIG_BRIDGE_IGMP_SNOOPING at kernel compile time (that's the host creating the bridge) or with sys/devices/virtual/net/brN/bridge/multicast_snooping (needs modification of the docker code?).

The real fix would be to understand where exactly the oops happen by decoding br_multicast_group_expired+0x2a (you can use gdb to load kernel with debug symbol and use "list *br_multicast_group_expired+0x2a"), report it, hunt for patches that could fix this upstream. ...

dlespiau commented 7 years ago

Removing this one from the 2.1 stabilisation project as it doesn't seem to be a runtime problem.

jcvenegas commented 7 years ago

I could not reproduce the panic , But I tested disabling /sys/devices/virtual/net/{docker,docker_gwbridge}/bridge/multicast_snooping and I dont see bat test regressions. Agree to use this workaround by now, @chavafg you could add it to the DUT before start the test

I've compiled a kernel with debug info to get more data about the oops, but need to reproduce it first.

jcvenegas commented 7 years ago

I've tried to reproduce the panic in Clear Linux without success. But the way I tracked the function that breaks.

1.- If you disable /sys/devices/virtual/net/{docker,docker_gwbridge}/bridge/multicast_snooping it still the the function is called , so either ignores disabling IGMP or I am not disabling the right bridge

2.- IGMP is used with Qemu/kvm (even using a simple docker run), if I try to track these functions using runc seems that never are called.

@dlespiau @mcastelino any idea ?

williammartin commented 7 years ago

Just thought I'd add a little extra information here, hopefully doesn't muddy the waters. The Cloud Foundry Garden container team uses runc and orthogonally in the general case we create a bridge per container. In some of our low level tests, we have seen this panic occur.

[ 6175.237271] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 6175.245770] IP: [<ffffffffc03d3588>] br_multicast_group_expired+0x28/0xa0 [bridge]
[ 6175.253894] PGD 673f00067 PUD 3c4932067 PMD 0 
[ 6175.258876] Oops: 0000 [#1] SMP 
[ 6175.262488] Modules linked in: dummy xt_iprange xt_nat xt_tcpudp xfrm_user xfrm_algo xt_addrtype nf_log_ipv4 nf_log_common xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment veth bridge stp llc btrfs xor raid6_pq aufs iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ppdev crct10dif_pclmul crc32_pclmul nfsd ghash_clmulni_intel auth_rpcgss aesni_intel nfs_acl aes_x86_64 lrw gf128mul nfs i2c_piix4 glue_helper ablk_helper pvpanic cryptd parport_pc 8250_fintek lockd parport input_leds grace mac_hid sunrpc serio_raw fscache psmouse virtio_scsi
[ 6175.328990] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
[ 6175.337306] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 6175.346923] task: ffff88079df24600 ti: ffff88079df38000 task.ti: ffff88079df38000
[ 6175.354642] RIP: 0010:[<ffffffffc03d3588>]  [<ffffffffc03d3588>] br_multicast_group_expired+0x28/0xa0 [bridge]
[ 6175.365716] RSP: 0018:ffff8807bfd43e70  EFLAGS: 00010246
[ 6175.372283] RAX: 0000000000000000 RBX: ffff88010525a840 RCX: ffff8807bfd43ee8
[ 6175.379821] RDX: 0000000000000001 RSI: ffffffffc03d3560 RDI: ffff88010525b2b0
[ 6175.388559] RBP: ffff8807bfd43e88 R08: 0000000000000006 R09: 0000000000000005
[ 6175.400348] R10: 0000000000000004 R11: 0000000000000005 R12: ffff88010525b2b0
[ 6175.407801] R13: ffff8806b93189c0 R14: ffffffffc03d3560 R15: ffff8806b93189c0
[ 6175.430223] FS:  0000000000000000(0000) GS:ffff8807bfd40000(0000) knlGS:0000000000000000
[ 6175.438697] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6175.444672] CR2: 0000000000000048 CR3: 0000000515341000 CR4: 00000000001406e0
[ 6175.452186] Stack:
[ 6175.454505]  ffff8807bfd4de00 ffff8806b9318a00 0000000000000100 ffff8807bfd43ec0
[ 6175.462677]  ffffffff810e74c5 ffff8807bfd4de00 ffff8806b9318a00 ffffffffc03d3560
[ 6175.470762]  0000000000000100 0000000000000000 ffff8807bfd43f18 ffffffff810e8e7e
[ 6175.478909] Call Trace:
[ 6175.481564]  <IRQ> 
[ 6175.483610]  [<ffffffff810e74c5>] call_timer_fn+0x35/0x130
[ 6175.489527]  [<ffffffffc03d3560>] ? br_multicast_router_expired+0x90/0x90 [bridge]
[ 6175.497234]  [<ffffffff810e8e7e>] run_timer_softirq+0x20e/0x2c0
[ 6175.503299]  [<ffffffff8108314d>] __do_softirq+0xdd/0x290
[ 6175.508919]  [<ffffffff81083525>] irq_exit+0x95/0xa0
[ 6175.514013]  [<ffffffff8180a8a2>] smp_apic_timer_interrupt+0x42/0x50
[ 6175.520497]  [<ffffffff81808b62>] apic_timer_interrupt+0x82/0x90
[ 6175.526641]  <EOI> 
[ 6175.528731]  [<ffffffff81062386>] ? native_safe_halt+0x6/0x10
[ 6175.535453]  [<ffffffff810378fe>] default_idle+0x1e/0xe0
[ 6175.540908]  [<ffffffff810380cf>] arch_cpu_idle+0xf/0x20
[ 6175.546349]  [<ffffffff810bf802>] default_idle_call+0x32/0x40
[ 6175.552235]  [<ffffffff810bfb4b>] cpu_startup_entry+0x2db/0x350
[ 6175.558374]  [<ffffffff8104f3d9>] start_secondary+0x149/0x170
[ 6175.564255] Code: 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 48 8b 5f 20 4c 8d a3 70 0a 00 00 4c 89 e7 e8 9c 44 43 c1 48 8b 43 18 <48> 8b 40 48 a8 01 74 07 49 83 7d 48 00 74 11 4c 89 e7 c6 07 00 
[ 6175.606941] RIP  [<ffffffffc03d3588>] br_multicast_group_expired+0x28/0xa0 [bridge]
[ 6175.614866]  RSP <ffff8807bfd43e70>
[ 6175.618494] CR2: 0000000000000048
[ 6175.621992] ---[ end trace 2b7c87a0c053d487 ]---
[ 6175.626768] Kernel panic - not syncing: Fatal exception in interrupt
[ 6175.634127] Kernel Offset: disabled
[ 6175.637787] Rebooting in 10 seconds..
[ 6185.618741] ACPI MEMORY or I/O RESET_REG.

Following the advice from here and setting multicast_snooping off for each bridge seems to have made the kernel panics go away.

It doesn't seem like it would be hard for us to reproduce (by reverting the snooping change), if someone had some thoughts on tracking this down and getting it fixed in the mainline/

mcastelino commented 7 years ago

@williammartin than you @dlespiau @jcvenegas We should turn off multicast_snooping on our integration bridge. I will make that change and see if this problem goes away.

amshinde commented 7 years ago

This issue may be related to : https://www.spinics.net/lists/netdev/msg431717.html