Open ppnaik1890 opened 5 years ago
Hi @ppnaik1890,
This looks to be a version mismatch issue on my side. I don't have access to an XL710 NIC in my lab here. Can you do me a favor? Please copy (overwrite) netmap/sys/net/netmap.h
and netmap/sys/net/netmap_user.h
to mtcp/mtcp/src/include/*
directory. You will notice that the existing netmap.h
in mtcp/src/include/netmap.h
has NETMAP_API
set to 11:
If this patch works, please submit these file changes to the mTCP devel
branch as a pull request. I will merge your changes.
Hi @ajamshed,
Thanks for your quick response. With the following changes, you suggested I could get mTCP working with 4 cores.
But now I have some performance issue while running epwget example.
sudo ./epserver -p /home/turing_05/www -f epserver.conf -N 4
[CPU 0] ens259f1 flows: 4126, RX: 13750(pps) (err: 0), 0.01(Gbps), TX: 10344(pps), 0.01(Gbps)
[CPU 1] ens259f1 flows: 4203, RX: 13921(pps) (err: 0), 0.01(Gbps), TX: 10408(pps), 0.01(Gbps)
[CPU 2] ens259f1 flows: 4051, RX: 12097(pps) (err: 0), 0.01(Gbps), TX: 9045(pps), 0.01(Gbps)
[CPU 3] ens259f1 flows: 4178, RX: 13788(pps) (err: 0), 0.01(Gbps), TX: 10329(pps), 0.01(Gbps)
[ ALL ] ens259f1 flows: 16558, RX: 53556(pps) (err: 0), 0.04(Gbps), TX: 40126(pps), 0.04(Gbps)
[CPU 0] ens259f1 flows: 597, RX: 15095(pps) (err: 0), 0.01(Gbps), TX: 9058(pps), 0.01(Gbps)
[CPU 1] ens259f1 flows: 520, RX: 15356(pps) (err: 0), 0.01(Gbps), TX: 9322(pps), 0.01(Gbps)
[CPU 2] ens259f1 flows: 489, RX: 16572(pps) (err: 0), 0.01(Gbps), TX: 10656(pps), 0.01(Gbps)
[CPU 3] ens259f1 flows: 747, RX: 15352(pps) (err: 0), 0.01(Gbps), TX: 9347(pps), 0.01(Gbps)
[ ALL ] ens259f1 flows: 2353, RX: 62375(pps) (err: 0), 0.05(Gbps), TX: 38383(pps), 0.03(Gbps)
[CPU 0] ens259f1 flows: 643, RX: 11014(pps) (err: 0), 0.01(Gbps), TX: 5507(pps), 0.00(Gbps)
[CPU 1] ens259f1 flows: 574, RX: 11012(pps) (err: 0), 0.01(Gbps), TX: 5506(pps), 0.00(Gbps)
[CPU 2] ens259f1 flows: 529, RX: 11031(pps) (err: 0), 0.01(Gbps), TX: 5541(pps), 0.00(Gbps)
[CPU 3] ens259f1 flows: 676, RX: 10994(pps) (err: 0), 0.01(Gbps), TX: 5497(pps), 0.00(Gbps)
[ ALL ] ens259f1 flows: 2422, RX: 44051(pps) (err: 0), 0.03(Gbps), TX: 22051(pps), 0.02(Gbps)
[CPU 0] ens259f1 flows: 4, RX: 196(pps) (err: 0), 0.00(Gbps), TX: 192(pps), 0.00(Gbps)
[CPU 1] ens259f1 flows: 4, RX: 129(pps) (err: 0), 0.00(Gbps), TX: 125(pps), 0.00(Gbps)
[CPU 2] ens259f1 flows: 0, RX: 16(pps) (err: 0), 0.00(Gbps), TX: 16(pps), 0.00(Gbps)
[CPU 3] ens259f1 flows: 0, RX: 202(pps) (err: 0), 0.00(Gbps), TX: 200(pps), 0.00(Gbps)
[ ALL ] ens259f1 flows: 8, RX: 543(pps) (err: 0), 0.00(Gbps), TX: 533(pps), 0.00(Gbps)
[CPU 0] ens259f1 flows: 511, RX: 10998(pps) (err: 0), 0.01(Gbps), TX: 5499(pps), 0.00(Gbps)
[CPU 1] ens259f1 flows: 512, RX: 10996(pps) (err: 0), 0.01(Gbps), TX: 5498(pps), 0.00(Gbps)
[CPU 2] ens259f1 flows: 513, RX: 10996(pps) (err: 0), 0.01(Gbps), TX: 5498(pps), 0.00(Gbps)
[CPU 3] ens259f1 flows: 516, RX: 10990(pps) (err: 0), 0.01(Gbps), TX: 5495(pps), 0.00(Gbps)
on server 2:
sudo ./epwget 169.254.9.84/small.txt 10000000 -c 22000 -f epwget.conf -N 4
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 2 handles 2500000 flows. connecting to 169.254.9.84:80
CPU 3: initialization finished.
CPU 0: initialization finished.
CPU 1: initialization finished.
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 1 handles 2500000 flows. connecting to 169.254.9.84:80
Learned new arp entry.
ARP Table:
IP addr: 169.254.9.84, dst_hwaddr: 3C:FD:FE:9E:7B:85
---------------------------------------------------------------------------------
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 0 handles 2500000 flows. connecting to 169.254.9.84:80
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 3 handles 2500000 flows. connecting to 169.254.9.84:80
Response size set to 204
[CPU 0] ens259f1 flows: 5500, RX: 18963(pps) (err: 0), 0.02(Gbps), TX: 30720(pps), 0.02(Gbps)
[CPU 1] ens259f1 flows: 5596, RX: 18027(pps) (err: 0), 0.02(Gbps), TX: 30520(pps), 0.02(Gbps)
[CPU 2] ens259f1 flows: 5893, RX: 18058(pps) (err: 0), 0.02(Gbps), TX: 30166(pps), 0.02(Gbps)
[CPU 3] ens259f1 flows: 5500, RX: 18856(pps) (err: 0), 0.02(Gbps), TX: 30841(pps), 0.02(Gbps)
[ ALL ] ens259f1 flows: 22489, RX: 73904(pps) (err: 0), 0.07(Gbps), TX: 122247(pps), 0.10(Gbps)
[CPU 0] ens259f1 flows: 5500, RX: 5478(pps) (err: 0), 0.00(Gbps), TX: 10978(pps), 0.01(Gbps)
[CPU 1] ens259f1 flows: 5596, RX: 5465(pps) (err: 0), 0.00(Gbps), TX: 10965(pps), 0.01(Gbps)
[CPU 2] ens259f1 flows: 5893, RX: 5475(pps) (err: 0), 0.00(Gbps), TX: 10975(pps), 0.01(Gbps)
[CPU 3] ens259f1 flows: 5500, RX: 5515(pps) (err: 0), 0.00(Gbps), TX: 11009(pps), 0.01(Gbps)
[ ALL ] ens259f1 flows: 22489, RX: 21933(pps) (err: 0), 0.02(Gbps), TX: 43927(pps), 0.03(Gbps)
[CPU 0] ens259f1 flows: 5500, RX: 435(pps) (err: 0), 0.00(Gbps), TX: 1191(pps), 0.00(Gbps)
[CPU 1] ens259f1 flows: 5500, RX: 749(pps) (err: 0), 0.00(Gbps), TX: 847(pps), 0.00(Gbps)
I have not run affinity.py file for IRQ pinning. Could you let me know if the issue is because of that or do I need some other configuration changes too? Also can you please help me with affinity.py for i40e (40G NIC). Thanks again for all your help.
@ppnaik1890,
I don't think this is an irq affinitization issue (although performance of your experiment may go up slightly if you correctly bind irq #s to cores). What is the file size of small.txt
? You may either be hitting PCI lane bottleneck (if your file size --> average packet size is ~ 64B). Or your conifg files (epserver.conf
, epwget.conf
) files may need some tuning (hint: [WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
).
Also, make sure that the NIC is placed in the first NUMA node (since you are using CPUs 0-3). To understand affinity.py
better, please see this link: https://null.53bits.co.uk/index.php?page=numa-and-queue-affinity
Hi @ajamshed,
I feel the RSS is not working properly. But could not find the reason for it. Following are the steps we followed to tune/ improve but did not succeed:
WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
was because of the init_rss in epwget. c. Commenting it removed it. We need that line but do not know how to fix this error.Observations:
i40e: unknown parameter 'RSS' ignored
in dmesg.sudo ethtool --show-rxfh-indir ens259f1
but it too showed:
RSS hash key:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
%Cpu0 : 18.9 us, 3.3 sy, 0.0 ni, 77.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 19.2 us, 4.6 sy, 0.0 ni, 76.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 67.9 us, 2.0 sy, 0.0 ni, 29.0 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu3 : 15.6 us, 2.7 sy, 0.0 ni, 81.3 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
Sorry for such long posts and trail. Please help me debug the issue. Thanks again for your help.
Hi @ppnaik1890,
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
is saying that the number of available source IP & port pairs is lower than the number of connection you are trying to create concurrently. This may have led mTCP to an unexpected behavior. I recommend you to reduce the concurrency of the client. You can still test larger server-side concurrency by increasing the number of clients. Client-side mTCP does not use all of 65535 port space to support symmetric RSS.
Hi,
I am having a host to host setup over 40G intel NICs (XL710). I was able to run epserver, epwget example for a single core, but unable to do that for more than 1 cores.
sudo ./epwget 169.254.9.84/small.txt 10000000 -f epwget.conf -N 2
The error I get is:
Also, the dmesg says:
[357006.217823] 606.558023 [ 376] netmap_ioctl_legacy Minimum supported API is 14 (requested 11)
I was able to run netmap pkt-gen with multiple cores. I have also changed the RSS hash in /netmap/LINUX/i40e-2.4.6/src/ and did make again.
My config file is as below: epwget.txt
Please help me resolve this issue. Thanks, Priyanka