pktgen / Pktgen-DPDK

DPDK based packet generator
Other
389 stars 119 forks source link

Illegal instruction scrn_constructor () #255

Closed vincentmli closed 2 weeks ago

vincentmli commented 6 months ago

Hi

I have two dell servers running same Ubuntu 22.04 version but with different dell model. pktgen runs fine on one dell server, same steps to install pktgen on another Dell server, pktgen core dumped. let me know what else information you need, thanks!

dpdk NIC:

root@r730:/etc/ld.so.conf.d# dpdk-devbind.py -s

Network devices using DPDK-compatible driver
============================================
0000:04:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' drv=uio_pci_generic unused=ixgbe

huge page:

dpdk-hugepages.py -p 2M --setup 2G

run pktgen

root@r730:/etc/ld.so.conf.d# gdb --args pktgen -l 0-1 -n 2 -- -P -T -N -m "1.[0]"

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pktgen...
(No debugging symbols found in pktgen)
(gdb) run
Starting program: /usr/local/bin/pktgen -l 0-1 -n 2 -- -P -T -N -m 1.\[0\]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x000055555555fa03 in scrn_constructor ()
(gdb) bt
#0  0x000055555555fa03 in scrn_constructor ()
#1  0x00007ffff7829ebb in call_init (env=<optimized out>, argv=0x7fffffffe558, argc=11) at ../csu/libc-start.c:145
#2  __libc_start_main_impl (main=0x55555555ed30 <main>, argc=11, argv=0x7fffffffe558, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe548) at ../csu/libc-start.c:379
#3  0x000055555555fae5 in _start ()
vincentmli commented 6 months ago

I have tried on virtual machine, and another mini PC, I got same core dump above, now I started to think if it is something I did wrong, but I could not since the installation is straightforward to me, the steps I roughly take are

git clone dpdk; 
cd dpdk; meson build; ninja -C buildl; ninja -C build install
dpdk-hugepages.py -p 2M --setup 2G 
modprobe uio_generic_pci
dpdk-devbind.py --bind=uio_generic_pci  04:00.0

git clone https://github.com/pktgen/Pktgen-DPDK.git
cd Pktgen-DPDK
meson build
ninja -C build
ninja -C build install
pktgen -l 0-1 -n 2 -- -P -T -N -m "1.[0]"
KeithWiles commented 6 months ago

Most likely the CPU's being used are not supporting all of the features required by DPDK. DPDK needs a minimum set of CPU features to work and a VM may not support that feature or a Non-XEON processors.

KeithWiles commented 6 months ago

The scrn_constructor() routine is some pretty simple and should not require anything special in the CPU. If you are copy the binary from one machine to the other machine this could be the problem too. DPDK reads the features during build and builds the binary specific to the machine. I think DPDK can be built with a minimum set of features via command line, but I do not remember how that is done.

vincentmli commented 6 months ago

thanks for the reply, it sounds like I did not do something silly :). I did not copy the binary, I build the binary on each test machine, a mini PC, a VM, a real dell power edge r730, they all got same scrn_constructor dump. I have one dell power edge r210 server working

vincentmli commented 6 months ago

dpdk config/x86/meson.build has

base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
foreach f:base_flags
    compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
endforeach

optional_flags = [
        'AES',
        'AVX',
        'AVX2',
        'AVX512BW',
        'AVX512CD',
        'AVX512DQ',
        'AVX512F',
        'AVX512VL',
        'PCLMUL',
        'RDRND',
        'RDSEED',
        'VPCLMULQDQ',
]
foreach f:optional_flags
    if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
        if f == 'PCLMUL' # special case flags with different defines
            f = 'PCLMULQDQ'
        elif f == 'RDRND'
            f = 'RDRAND'
        endif
        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
    endif
endforeach

my mini pc has no avx

flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust erms invpcid rdseed intel_pt xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds mmio_unknown
vincentmli commented 6 months ago

@KeithWiles someone from https://stackoverflow.com/questions/75098061/wrong-detection-of-cpu-instruction-during-dpdk-build mentioned to build dpdk app with rte_cpu_is_supported so the cpu support can be detected at compile time, is this something pktgen could do?

KeithWiles commented 6 months ago

I rely on DPDK doing the correct thing here with CPU flags. I do not remember using any thing special in Pktgen code to need using the rte_cpu_is_supported() function. It could be the compilers are different between the machines, but just shooting in the dark here.

Would need to debug the problem on the machine and I do not have access to a machine, sorry not much help here. I would prefer not to use external machines it kind of opens me up to some legal issues IMO.

KeithWiles commented 6 months ago

You can try putting printf() in the routine and see where it fails. Remember add a flush(stdout) to make sure the text gets printed. Using something like this following the code path.

printf("%s:%d Entry\n", func, LINE); flush(stdout);

printf("%s:%d Here 1\n", func, LINE); flush(stdout);

printf("%s:%d Here 2\n", func, LINE); flush(stdout);

printf("%s:%d Exit\n", func, LINE); flush(stdout);

vincentmli commented 6 months ago

the code path in app/pktgen-main.c, so I probably should put printf in line 467, 468, 471 functions, right?

450     /* call before the rte_eal_init() */
451     (void)rte_set_application_usage_hook(pktgen_usage);
452 
453     memset(&pktgen, 0, sizeof(pktgen));
454 
455     pktgen.flags             = PRINT_LABELS_FLAG;
456     pktgen.ident             = 0x1234;
457     pktgen.nb_rxd            = DEFAULT_RX_DESC;
458     pktgen.nb_txd            = DEFAULT_TX_DESC;
459     pktgen.nb_ports_per_page = DEFAULT_PORTS_PER_PAGE;
460 
461     if ((pktgen.l2p = l2p_create()) == NULL)
462         pktgen_log_panic("Unable to create l2p");
463 
464     pktgen.portdesc_cnt = get_portdesc(pktgen.portlist, pktgen.portdesc, RTE_MAX_ETHPORTS, 0);
465 
466     /* Initialize the screen and logging */
467     pktgen_init_log();
468     pktgen_cpu_init();
469 
470     /* initialize EAL */
471     ret = rte_eal_init(argc, argv);
KeithWiles commented 6 months ago

I was thinking the scrn_constructor(), but in the above locations is fine too.