networkservicemesh / cmd-forwarder-vpp

Apache License 2.0
2 stars 22 forks source link

Can not start forwarder #1170

Open lapnd opened 1 month ago

lapnd commented 1 month ago

Hi, I'm using k3s on ubuntu 22.04 and trying to install NSM. Hugepage was setup as

cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.8.0-40-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro console=tty0 console=ttyS0,115200n8 default_hugepagesz=2M hugepagesz=2M hugepages=1024

cat /proc/meminfo | grep HugePages
AnonHugePages:   4329472 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0

However, the forwarder can not start

Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] Config: &config.Config{Name:"forwarder-vpp-5xlwh", Labels:map[string]string{"p2p":"true"}, NSName:"forwarder", ConnectTo:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/var/lib/networkservicemesh/nsm.io.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, ListenOn:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/listen.on.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, MaxTokenLifetime:600000000000, RegistryClientPolicies:[]string{"etc/nsm/opa/common/.*.rego", "etc/nsm/opa/registry/.*.rego", "etc/nsm/opa/client/.*.rego"}, LogLevel:"INFO", DialTimeout:750000000, OpenTelemetryEndpoint:"otel-collector.observability.svc.cluster.local:4317", MetricsExportInterval:10000000000, PprofEnabled:false, PprofListenOn:"localhost:6060", PrometheusListenOn:":8081", PrometheusServerHeaderTimeout:5000000000, TunnelIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0x4, 0x0, 0xd}, VxlanPort:0x0, VppAPISocket:"/var/run/vpp/external/vpp-api.sock", VppInit:vppinit.Func{f:(func(context.Context, api.Connection, net.IP) (net.IP, error))(0xf47e60)}, VppInitParams:"", ResourcePollTimeout:30000000000, DevicePluginPath:"/var/lib/kubelet/device-plugins/", PodResourcesPath:"/var/lib/kubelet/pod-resources/", DeviceSelectorFile:"", SRIOVConfigFile:"", PCIDevicesPath:"/sys/bus/pci/devices", PCIDriversPath:"/sys/bus/pci/drivers", CgroupPath:"/host/sys/fs/cgroup/devices", VFIOPath:"/host/dev/vfio", MechanismPriority:[]string(nil)}
Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] [duration:4.971254ms] completed phase 1: get config from environment
Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] executing phase 2: run vpp and get a connection to it (time since start: 5.179467ms)
Sep 18 09:45:46.650 [INFO] Configuration file: "/etc/vpp/helper/vpp.conf" not found, using defaults
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] local vpp is being used
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] [duration:35.846189ms] completed phase 2: run vpp and get a connection to it
Sep 18 09:45:46.652 [WARN] [cmd:/bin/forwarder] skipping phases 3-5: no PCI resources config
Sep 18 09:45:46.652 [WARN] [cmd:/bin/forwarder] SR-IOV is not enabled
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] executing phase 6: retrieving svid, check spire agent logs if this is the last line you see (time since start: 41.104723ms)
Sep 18 09:45:46.991 [INFO] [cmd:vpp] vpp[55423]: buffer: numa[0] falling back to non-hugepage backed buffer pool (vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1000000000 fd 5 numa 0 flags 0x11: Cannot allocate memory)
Sep 18 09:45:47.675 [INFO] [cmd:vpp] vpp[55423]: buffer: numa[1] falling back to non-hugepage backed buffer pool (vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1008000000 fd 6 numa 1 flags 0x11: Cannot allocate memory)
Sep 18 09:45:47.682 [INFO] SVID: "spiffe://k8s.nsm/ns/nsm-system/pod/forwarder-vpp-5xlwh"
Sep 18 09:45:47.682 [INFO] [cmd:/bin/forwarder] [duration:1.029478195s] completed phase 6: retrieving svid
Sep 18 09:45:47.682 [INFO] [cmd:/bin/forwarder] executing phase 7: create xconnect network service endpoint (time since start: 1.070753722s)
Sep 18 09:45:47.753 [INFO] [ReadConfig:] [cmd:/bin/forwarder] Using default VPP init parameters &{AF_PACKET:{&{Mode:AF_PACKET_API_MODE_ETHERNET RxFrameSize:10240 TxFrameSize:10240 RxFramesPerBlock:1024 TxFramesPerBlock:1024 NumRxQueues:1 NumTxQueues:0 Flags:AF_PACKET_API_FLAG_VERSION_2}},AF_XDP:{&{Mode:AF_XDP_API_MODE_AUTO RxqSize:8192 TxqSize:8192 Flags:AfXdpFlag(0)}},}
Sep 18 09:45:48.395 [INFO] [cmd:vpp] vpp[55423]: vlib_sort_init_exit_functions:201: init function 'pci_bus_init' not found (before 'idpf_init')
Sep 18 09:45:48.681 [INFO] [cmd:vpp] vpp[55423]: vnet_feature_arc_init:272: feature node 'ip4-sv-reassembly-output-feature' not found (before 'npt66-output', arc 'ip6-output')
Sep 18 09:45:48.681 [INFO] [cmd:vpp] vpp[55423]: vnet_feature_arc_init:272: feature node 'ip4-sv-reassembly-feature' not found (before 'npt66-input', arc 'ip6-unicast')
Sep 18 09:45:48.850 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to bind rx packet socket: No such device (errno 19)
Sep 18 09:45:48.859 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to set queue 0 error
Sep 18 09:45:48.859 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to init device error
panic: error: VPPApiError: System call error #1 (-11)

goroutine 1 [running]:
github.com/networkservicemesh/cmd-forwarder-vpp/internal/vppinit.Must(...)
    /build/internal/vppinit/vppinit.go:112
main.main()
    /build/main.go:262 +0x364c

Could you kindly provide any insights or clues regarding possible issues that could lead to this failure? Any suggestions for steps to troubleshoot or resolve the problem would be greatly appreciated.

lapnd commented 1 month ago

I have additional information. When I try NSM on an Ubuntu 22.04 VM (using the Ubuntu cloud image), it works fine. The issue occurs on Kubernetes running on a bare-metal host. I haven't figured out the possible cause yet.