Closed sbates130272 closed 6 years ago
Hi @sbates130272
I think the problem might be that arpping is being called. It used to have a check not to run arpping but that seems to have been removed. Anyway, if you call it on an ipoib interface like ib0 you will usually get a kernel panic. At least I did.
So, please add a check around
# Give our ARP neighbors a nudge about the new interface
if installed arping; then
IPADDR=$(echo "$IPADDR" | cut -d/ -f1)
ip netns exec "$NSPID" arping -c 1 -A -I "$CONTAINER_IFNAME" "$IPADDR" > /dev/null 2>&1 || true
else
echo "Warning: arping not found; interface may not be immediately reachable"
fi
i.e.
if [ $IFTYPE != "ipoib" ]
then
# Give our ARP neighbors a nudge about the new interface
if installed arping; then
IPADDR=$(echo "$IPADDR" | cut -d/ -f1)
ip netns exec "$NSPID" arping -c 1 -A -I "$CONTAINER_IFNAME" "$IPADDR" > /dev/null 2>&1 || true
else
echo "Warning: arping not found; interface may not be immediately reachable"
fi
fi
For reference, here the original pipework infiniband mod I put together. https://github.com/hookenz/pipework/blob/master/pipework
Hey thanks @hookenz for the prompt response. I wonder why your code was removed as it might be nice to have that in?
I don't think my issue has anything to do with ARP as my kernel panic comes alot sooner in the pipework script. I've tracked it down to this command:
ip link add link ib0 name ib0.2074 type ipoib
I am also pretty suspicious of the kernel here as it seems like any such command issued from user-space should, at worst, cause the kernel to report an error of some sort, not to panic. I'm running a 4.12 kernel so I will do some more digging and see if this needs to go to LKML...
Hi!
For what it's worth, I think that any kernel panic warrants going to LKML. The kernel should not panic when you're merely manipulating network interfaces.
I wish I could help you further, but alas, I have stopped working on pipework, and I don't see an obvious answer to your question. 🤔
If you're still actively looking for an answer, I'd suggest that you check or ask through:
If this is a critical issue for you, I know a few amazing consultants that you might want to hire. Let me know how to contact you and I'll put you in touch with them.
Thank you!
1) Sorry for hijacking
2) @sbates130272 I got the same issue, have you ever figured out a way to pull the interface into another namespace? or at least not create the macvlan on it?
@Thoro - The crashing sounds like a kernel bug. I'm not using infiniband anymore so I can't really help beyond this. Try upgrading the kernel or use physical IB interface and SR-IOV rather than virtual IB which is the default.
Yeah, got it working with directly pulling the base interface into the container namespace.
Hi
When attempting to pass an IB interface into a container I am getting a kernel panic (see attached). The command being run is pretty simple (./pipework/pipework ib0 6dbf 10.10.10.1/24). I will do some digging and see if I can russle up some more informtion:
[ 945.996584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 945.996687] IP: __ipoib_vlan_add+0x1e/0x260 [ib_ipoib] [ 945.996736] PGD 379358067 [ 945.996738] P4D 379358067 [ 945.996765] PUD 382f7f067 [ 945.996792] PMD 0
[ 945.996858] Oops: 0000 [#1] SMP [ 945.996889] Modules linked in: mlx5_ib veth ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm configfs ib_uverbs ib_core ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay bnep snd_hda_codec_hdmi eeepc_wmi snd_hda_codec_realtek asus_wmi snd_hda_codec_generic sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm irqbypass crct10dif_pclmul crc32_pclmul snd_seq_midi snd_seq_midi_event snd_rawmidi ghash_clmulni_intel pcbc snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd glue_helper cryptd snd intel_cstate intel_rapl_perf serio_raw soundcore [ 945.997548] switchtec mei_me shpchp mei hci_uart btbcm btqca btintel bluetooth ecdh_generic acpi_als kfifo_buf mac_hid acpi_pad intel_lpss_acpi intel_lpss tpm_crb industrialio parport_pc ppdev lp parport autofs4 uas usb_storage mxm_wmi i915 psmouse e1000e i2c_algo_bit ahci drm_kms_helper libahci syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core drm devlink wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid [ 945.997910] CPU: 3 PID: 2186 Comm: ip Not tainted 4.12.3+p2pmem-nvme-v4.12-rc3-1449-g9c9a27b+ #1 [ 945.997987] Hardware name: System manufacturer System Product Name/PRIME Q270M-C, BIOS 0602 01/20/2017 [ 945.998070] task: ffff9170bb3a4b00 task.stack: ffffa148c391c000 [ 945.998130] RIP: 0010:ipoib_vlan_add+0x1e/0x260 [ib_ipoib] [ 945.998182] RSP: 0018:ffffa148c391f860 EFLAGS: 00010286 [ 945.998231] RAX: 0000000000001000 RBX: 0000000000000000 RCX: 0000000000000002 [ 945.998295] RDX: 000000000000ffff RSI: 0000000000000000 RDI: ffff9170bb363000 [ 945.998359] RBP: ffffa148c391f888 R08: ffff9170f5d9f440 R09: ffff9170e5003500 [ 945.998422] R10: 000000003e18c000 R11: 0000000000023b21 R12: ffff9170bb363000 [ 945.998485] R13: 000000000000ffff R14: ffff9170bb363000 R15: ffffffffc0993b20 [ 945.998550] FS: 00007f227c7cb700(0000) GS:ffff9170f5d80000(0000) knlGS:0000000000000000 [ 945.998622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 945.998674] CR2: 0000000000000008 CR3: 0000000379311000 CR4: 00000000003406e0 [ 945.998738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 945.998802] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 945.998864] Call Trace: [ 945.998899] ipoib_new_child_link+0x91/0x110 [ib_ipoib] [ 945.998952] rtnl_newlink+0x610/0x8c0 [ 945.998992] ? rtnl_link_ops_get+0x39/0x50 [ 945.999032] ? rtnl_newlink+0x187/0x8c0 [ 945.999076] rtnetlink_rcv_msg+0xee/0x220 [ 945.999116] ? rtnl_newlink+0x8c0/0x8c0 [ 945.999154] netlink_rcv_skb+0xe7/0x120 [ 945.999192] rtnetlink_rcv+0x28/0x30 [ 945.999230] netlink_unicast+0x18c/0x240 [ 945.999269] netlink_sendmsg+0x2c5/0x3b0 [ 945.999312] socksendmsg+0x38/0x50 [ 945.999349] sys_sendmsg+0x2d7/0x2f0 [ 945.999390] ? mem_cgroup_commit_charge+0x7e/0x4e0 [ 946.001628] ? handle_mm_fault+0xd3d/0xfd0 [ 946.003832] sys_sendmsg+0x54/0x90 [ 946.006021] ? sys_sendmsg+0x54/0x90 [ 946.008230] SyS_sendmsg+0x12/0x20 [ 946.010450] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 946.012693] RIP: 0033:0x7f227c0fe450 [ 946.014948] RSP: 002b:00007ffcbda8dc18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 946.017271] RAX: ffffffffffffffda RBX: 00007ffcbda95d20 RCX: 00007f227c0fe450 [ 946.019381] RDX: 0000000000000000 RSI: 00007ffcbda8dc60 RDI: 0000000000000003 [ 946.020879] RBP: 0000000000000000 R08: 00000000004024b8 R09: 00007f227c7eb168 [ 946.022340] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffcbda8dc60 [ 946.023757] R13: 00007ffcbda95d58 R14: 00000000006573a0 R15: 00007ffcbda978fe [ 946.024845] Code: 00 e8 07 30 6b cf 5d 48 98 c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 41 52 41 89 d5 53 8b 87 94 03 00 00 49 89 fc <48> 8b 56 08 48 89 f3 41 89 ce 89 86 94 03 00 00 83 e8 04 89 82 [ 946.026969] RIP: ipoib_vlan_add+0x1e/0x260 [ib_ipoib] RSP: ffffa148c391f860 [ 946.027993] CR2: 0000000000000008 [ 946.032355] ---[ end trace 294766f7e80c56b3 ]---