jpetazzo / pipework

Software-Defined Networking tools for LXC (LinuX Containers)
Apache License 2.0
4.21k stars 727 forks source link

kernel panic on ib0 allocation to container #217

Closed sbates130272 closed 6 years ago

sbates130272 commented 7 years ago

Hi

When attempting to pass an IB interface into a container I am getting a kernel panic (see attached). The command being run is pretty simple (./pipework/pipework ib0 6dbf 10.10.10.1/24). I will do some digging and see if I can russle up some more informtion:

[ 945.996584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 945.996687] IP: __ipoib_vlan_add+0x1e/0x260 [ib_ipoib] [ 945.996736] PGD 379358067 [ 945.996738] P4D 379358067 [ 945.996765] PUD 382f7f067 [ 945.996792] PMD 0

[ 945.996858] Oops: 0000 [#1] SMP [ 945.996889] Modules linked in: mlx5_ib veth ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm configfs ib_uverbs ib_core ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay bnep snd_hda_codec_hdmi eeepc_wmi snd_hda_codec_realtek asus_wmi snd_hda_codec_generic sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm irqbypass crct10dif_pclmul crc32_pclmul snd_seq_midi snd_seq_midi_event snd_rawmidi ghash_clmulni_intel pcbc snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd glue_helper cryptd snd intel_cstate intel_rapl_perf serio_raw soundcore [ 945.997548] switchtec mei_me shpchp mei hci_uart btbcm btqca btintel bluetooth ecdh_generic acpi_als kfifo_buf mac_hid acpi_pad intel_lpss_acpi intel_lpss tpm_crb industrialio parport_pc ppdev lp parport autofs4 uas usb_storage mxm_wmi i915 psmouse e1000e i2c_algo_bit ahci drm_kms_helper libahci syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core drm devlink wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid [ 945.997910] CPU: 3 PID: 2186 Comm: ip Not tainted 4.12.3+p2pmem-nvme-v4.12-rc3-1449-g9c9a27b+ #1 [ 945.997987] Hardware name: System manufacturer System Product Name/PRIME Q270M-C, BIOS 0602 01/20/2017 [ 945.998070] task: ffff9170bb3a4b00 task.stack: ffffa148c391c000 [ 945.998130] RIP: 0010:ipoib_vlan_add+0x1e/0x260 [ib_ipoib] [ 945.998182] RSP: 0018:ffffa148c391f860 EFLAGS: 00010286 [ 945.998231] RAX: 0000000000001000 RBX: 0000000000000000 RCX: 0000000000000002 [ 945.998295] RDX: 000000000000ffff RSI: 0000000000000000 RDI: ffff9170bb363000 [ 945.998359] RBP: ffffa148c391f888 R08: ffff9170f5d9f440 R09: ffff9170e5003500 [ 945.998422] R10: 000000003e18c000 R11: 0000000000023b21 R12: ffff9170bb363000 [ 945.998485] R13: 000000000000ffff R14: ffff9170bb363000 R15: ffffffffc0993b20 [ 945.998550] FS: 00007f227c7cb700(0000) GS:ffff9170f5d80000(0000) knlGS:0000000000000000 [ 945.998622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 945.998674] CR2: 0000000000000008 CR3: 0000000379311000 CR4: 00000000003406e0 [ 945.998738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 945.998802] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 945.998864] Call Trace: [ 945.998899] ipoib_new_child_link+0x91/0x110 [ib_ipoib] [ 945.998952] rtnl_newlink+0x610/0x8c0 [ 945.998992] ? rtnl_link_ops_get+0x39/0x50 [ 945.999032] ? rtnl_newlink+0x187/0x8c0 [ 945.999076] rtnetlink_rcv_msg+0xee/0x220 [ 945.999116] ? rtnl_newlink+0x8c0/0x8c0 [ 945.999154] netlink_rcv_skb+0xe7/0x120 [ 945.999192] rtnetlink_rcv+0x28/0x30 [ 945.999230] netlink_unicast+0x18c/0x240 [ 945.999269] netlink_sendmsg+0x2c5/0x3b0 [ 945.999312] socksendmsg+0x38/0x50 [ 945.999349] sys_sendmsg+0x2d7/0x2f0 [ 945.999390] ? mem_cgroup_commit_charge+0x7e/0x4e0 [ 946.001628] ? handle_mm_fault+0xd3d/0xfd0 [ 946.003832] sys_sendmsg+0x54/0x90 [ 946.006021] ? sys_sendmsg+0x54/0x90 [ 946.008230] SyS_sendmsg+0x12/0x20 [ 946.010450] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 946.012693] RIP: 0033:0x7f227c0fe450 [ 946.014948] RSP: 002b:00007ffcbda8dc18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 946.017271] RAX: ffffffffffffffda RBX: 00007ffcbda95d20 RCX: 00007f227c0fe450 [ 946.019381] RDX: 0000000000000000 RSI: 00007ffcbda8dc60 RDI: 0000000000000003 [ 946.020879] RBP: 0000000000000000 R08: 00000000004024b8 R09: 00007f227c7eb168 [ 946.022340] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffcbda8dc60 [ 946.023757] R13: 00007ffcbda95d58 R14: 00000000006573a0 R15: 00007ffcbda978fe [ 946.024845] Code: 00 e8 07 30 6b cf 5d 48 98 c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 41 52 41 89 d5 53 8b 87 94 03 00 00 49 89 fc <48> 8b 56 08 48 89 f3 41 89 ce 89 86 94 03 00 00 83 e8 04 89 82 [ 946.026969] RIP: ipoib_vlan_add+0x1e/0x260 [ib_ipoib] RSP: ffffa148c391f860 [ 946.027993] CR2: 0000000000000008 [ 946.032355] ---[ end trace 294766f7e80c56b3 ]---

hookenz commented 7 years ago

Hi @sbates130272

I think the problem might be that arpping is being called. It used to have a check not to run arpping but that seems to have been removed. Anyway, if you call it on an ipoib interface like ib0 you will usually get a kernel panic. At least I did.

So, please add a check around

  # Give our ARP neighbors a nudge about the new interface
  if installed arping; then
    IPADDR=$(echo "$IPADDR" | cut -d/ -f1)
    ip netns exec "$NSPID" arping -c 1 -A -I "$CONTAINER_IFNAME" "$IPADDR" > /dev/null 2>&1 || true
  else
    echo "Warning: arping not found; interface may not be immediately reachable"
  fi

i.e.

if [ $IFTYPE != "ipoib" ] 
then 
 # Give our ARP neighbors a nudge about the new interface
  if installed arping; then
    IPADDR=$(echo "$IPADDR" | cut -d/ -f1)
    ip netns exec "$NSPID" arping -c 1 -A -I "$CONTAINER_IFNAME" "$IPADDR" > /dev/null 2>&1 || true
  else
    echo "Warning: arping not found; interface may not be immediately reachable"
  fi
fi

For reference, here the original pipework infiniband mod I put together. https://github.com/hookenz/pipework/blob/master/pipework

sbates130272 commented 7 years ago

Hey thanks @hookenz for the prompt response. I wonder why your code was removed as it might be nice to have that in?

I don't think my issue has anything to do with ARP as my kernel panic comes alot sooner in the pipework script. I've tracked it down to this command:

ip link add link ib0 name ib0.2074 type ipoib

I am also pretty suspicious of the kernel here as it seems like any such command issued from user-space should, at worst, cause the kernel to report an error of some sort, not to panic. I'm running a 4.12 kernel so I will do some more digging and see if this needs to go to LKML...

jpetazzo commented 6 years ago

Hi!

For what it's worth, I think that any kernel panic warrants going to LKML. The kernel should not panic when you're merely manipulating network interfaces.

I wish I could help you further, but alas, I have stopped working on pipework, and I don't see an obvious answer to your question. 🤔

If you're still actively looking for an answer, I'd suggest that you check or ask through:

If this is a critical issue for you, I know a few amazing consultants that you might want to hire. Let me know how to contact you and I'll put you in touch with them.

Thank you!

thoro commented 6 years ago

1) Sorry for hijacking

2) @sbates130272 I got the same issue, have you ever figured out a way to pull the interface into another namespace? or at least not create the macvlan on it?

hookenz commented 6 years ago

@Thoro - The crashing sounds like a kernel bug. I'm not using infiniband anymore so I can't really help beyond this. Try upgrading the kernel or use physical IB interface and SR-IOV rather than virtual IB which is the default.

thoro commented 6 years ago

Yeah, got it working with directly pulling the base interface into the container namespace.