xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

unable to handle kernel NULL pointer / tun_net_xmit+0x3de/0x460 [tun] #528

Closed U-siro closed 2 years ago

U-siro commented 2 years ago

I've set up pfsense and windows 11 VM, but xcp-ng crashes when some network operation happen. Here is /var/crash/[DATE]/dom0.log

[ 525.566493] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 525.566523] INFO: PGD 12b14c067 P4D 12b14c067 PUD 12a24f067 PMD 0 [ 525.566538] WARN: Oops: 0000 [#1] SMP NOPTI [ 525.566548] WARN: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G O 4.19.0+1 #1 [ 525.566560] WARN: Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.E006.013120181511 01/31/2018 [ 525.566583] WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0 [ 525.566592] WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff [ 525.566617] WARN: RSP: e02b:ffff88812df03658 EFLAGS: 00010282 [ 525.566626] WARN: RAX: ffff8881283c30e0 RBX: 0000000000000000 RCX: 00000000000000c0 [ 525.566637] WARN: RDX: ffff8881283c30c0 RSI: ffff8881283c30c0 RDI: ffffea0004da3d40 [ 525.566648] WARN: RBP: 0000000000000000 R08: ffff8881283c3000 R09: 0000000000000001 [ 525.566659] WARN: R10: 0000000000000330 R11: ffff8881006d9d40 R12: ffff88812597a100 [ 525.566670] WARN: R13: 0000000000000000 R14: ffff8881036648c0 R15: 0000000000000000 [ 525.566693] WARN: FS: 00007f7d39bfe700(0000) GS:ffff88812df00000(0000) knlGS:0000000000000000 [ 525.566705] WARN: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [ 525.566715] WARN: CR2: 0000000000000008 CR3: 00000001276a4000 CR4: 0000000000040660 [ 525.566733] WARN: Call Trace: [ 525.566743] WARN: [ 525.566758] WARN: tun_net_xmit+0x3de/0x460 [tun] [ 525.566771] WARN: dev_hard_start_xmit+0xa4/0x210 [ 525.566782] WARN: sch_direct_xmit+0x10d/0x350 [ 525.566790] WARN: qdisc_run+0x167/0x4e0 [ 525.566799] WARN: ? pfifo_fast_enqueue+0x92/0xf0 [ 525.566807] WARN: __dev_queue_xmit+0x511/0x900 [ 525.566820] WARN: ? enqueue_entity+0x4e8/0xaf0 [ 525.566831] WARN: do_execute_actions+0x157f/0x1750 [openvswitch] [ 525.566845] WARN: ? handle_irq_event_percpu+0x4d/0x1a0 [ 525.566855] WARN: ? handle_irq_event_percpu+0x51/0x70 [ 525.566864] WARN: ? handle_irq_event+0x41/0x60 [ 525.566873] WARN: ? handle_edge_irq+0x9e/0x190 [ 525.566881] WARN: ? generic_handle_irq+0x24/0x30 [ 525.566892] WARN: ? evtchn_fifo_handle_events+0x180/0x1a0 [ 525.566905] WARN: ? irq_exit+0x3e/0xc0 [ 525.566913] WARN: ? xen_evtchn_do_upcall+0x2c/0x50 [ 525.566926] WARN: ? xen_do_hypervisor_callback+0x29/0x40 [ 525.566935] WARN: ? error_exit+0x5/0x20 [ 525.566944] WARN: ovs_execute_actions+0x47/0x120 [openvswitch] [ 525.566955] WARN: ovs_dp_process_packet+0x7d/0x110 [openvswitch] [ 525.566967] WARN: ? key_extract+0xa53/0xd60 [openvswitch] [ 525.566978] WARN: ovs_vport_receive+0x6e/0xd0 [openvswitch] [ 525.566989] WARN: ? _raw_spin_unlock_irqrestore+0x14/0x20 [ 525.566999] WARN: ? __alloc_skb+0x4e/0x270 [ 525.567009] WARN: ? alloc_skb+0x76/0x270 [ 525.567020] WARN: ? arch_local_irq_restore+0x5/0x10 [ 525.567030] WARN: ? slab_alloc.constprop.81+0x42/0x4e [ 525.567040] WARN: ? __alloc_skb+0x76/0x270 [ 525.567048] WARN: ? kmalloc_track_caller+0x58/0x200 [ 525.567060] WARN: netdev_frame_hook+0x105/0x180 [openvswitch] [ 525.567070] WARN: netif_receive_skb_core+0x211/0xb30 [ 525.567081] WARN: __netif_receive_skb_one_core+0x36/0x70 [ 525.567092] WARN: netif_receive_skb_internal+0x34/0xe0 [ 525.567104] WARN: xenvif_tx_action+0x47d/0x8f0 [ 525.567113] WARN: ? xenvif_interrupt+0x40/0x90 [ 525.567122] WARN: xenvif_poll+0x27/0x70 [ 525.567131] WARN: net_rx_action+0x2a5/0x3e0 [ 525.567140] WARN: do_softirq+0xd1/0x28c [ 525.567149] WARN: irq_exit+0xa8/0xc0 [ 525.567157] WARN: xen_evtchn_do_upcall+0x2c/0x50 [ 525.567167] WARN: xen_do_hypervisor_callback+0x29/0x40 [ 525.567176] WARN: [ 525.567184] WARN: RIP: e030:xen_hypercall_sched_op+0xa/0x20 [ 525.567193] WARN: Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 525.567217] WARN: RSP: e02b:ffffc900400afeb0 EFLAGS: 00000246 [ 525.567226] WARN: RAX: 0000000000000000 RBX: ffff88812d60ba00 RCX: ffffffff810013aa [ 525.567237] WARN: RDX: ffffffff8203d250 RSI: 0000000000000000 RDI: 0000000000000001 [ 525.567249] WARN: RBP: 0000000000000004 R08: 0000000000000008 R09: 0000007a4cdde764 [ 525.567260] WARN: R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 [ 525.567271] WARN: R13: 0000000000000000 R14: ffff88812d60ba00 R15: ffff88812d60ba00 [ 525.567284] WARN: ? xen_hypercall_sched_op+0xa/0x20 [ 525.567296] WARN: ? xen_safe_halt+0xc/0x20 [ 525.567304] WARN: ? default_idle+0x1a/0x140 [ 525.567312] WARN: ? do_idle+0x1ea/0x260 [ 525.567320] WARN: ? cpu_startup_entry+0x6f/0x80 [ 525.567328] WARN: Modules linked in: tun nls_utf8 cifs ccm fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c dm_multipath iptable_filter sunrpc nls_iso8859_1 nls_cp437 vfat fat sb_edac intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper dm_mod ipmi_si ipmi_devintf ipmi_msghandler sg i2c_i801 lpc_ich ip_tables x_tables sd_mod sr_mod cdrom hid_generic usbhid hid ata_generic pata_acpi isci libsas ata_piix scsi_transport_sas ehci_pci libata ehci_hcd igb(O) scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod efivarfs [ 525.567461] WARN: ipv6 crc_ccitt [ 525.567470] WARN: CR2: 0000000000000008 [ 525.567482] WARN: ---[ end trace 02149bdc6f94d2a3 ]--- [ 525.570832] WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0 [ 525.570843] WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff [ 525.570868] WARN: RSP: e02b:ffff88812df03658 EFLAGS: 00010282 [ 525.570877] WARN: RAX: ffff8881283c30e0 RBX: 0000000000000000 RCX: 00000000000000c0 [ 525.570888] WARN: RDX: ffff8881283c30c0 RSI: ffff8881283c30c0 RDI: ffffea0004da3d40 [ 525.570900] WARN: RBP: 0000000000000000 R08: ffff8881283c3000 R09: 0000000000000001 [ 525.570911] WARN: R10: 0000000000000330 R11: ffff8881006d9d40 R12: ffff88812597a100 [ 525.570922] WARN: R13: 0000000000000000 R14: ffff8881036648c0 R15: 0000000000000000 [ 525.570943] WARN: FS: 00007f7d39bfe700(0000) GS:ffff88812df00000(0000) knlGS:0000000000000000 [ 525.570955] WARN: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [ 525.570965] WARN: CR2: 0000000000000008 CR3: 00000001276a4000 CR4: 0000000000040660 [ 525.570983] EMERG: Kernel panic - not syncing: Fatal exception in interrupt

I applied all xcp-ng-testing and kernel-alt, but results are same.

Fohdeesha commented 2 years ago

Hi, out of curiosity, what model of NIC is on the physical server? @olivierlambert could this be related to the crashes we saw previously when NAT was performed in a guest, crashing hosts with Broadcom NICs?

olivierlambert commented 2 years ago

If it's Broadcom hardware, it's likely. And there's a workaround (disabling some NIC features)

U-siro commented 2 years ago

@Fohdeesha It isn't broadcom. Even I use non-physical network device for lan(wan is physical), it's still same. @olivierlambert

I want to try that workaround to Intel hardware, How to do it?

U-siro commented 2 years ago

Closing this issue because I leave xcp-ng, so no more reproduce this problem.

olivierlambert commented 2 years ago

Well, that's sad :(

Do you have any details regarding the network operation that triggered the issue?

U-siro commented 2 years ago

@olivierlambert I've set up pfsense with VPN wan(cloudflare warp), using wireguard. also I tried to connect to internet(www.naver.com) using windows 11 VM, then happened.