mtcp-stack / mtcp

mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems
Other
1.98k stars 436 forks source link

mTCP netmap multicore over 40G NIC #222

Open ppnaik1890 opened 5 years ago

ppnaik1890 commented 5 years ago

Hi,

I am having a host to host setup over 40G intel NICs (XL710). I was able to run epserver, epwget example for a single core, but unable to do that for more than 1 cores.

sudo ./epwget 169.254.9.84/small.txt 10000000 -f epwget.conf -N 2

The error I get is:

[netmap_init_handle:  79] Opening netmap:ens259f1-0 with j: 0 (cpu: 0)
[netmap_init_handle:  79] Opening netmap:ens259f1-1 with j: 0 (cpu: 1)
672.103553 nm_open [847] NIOCREGIF failed: Invalid argument ens259f1-0
[netmap_init_handle:  88] Unable to open netmap:ens259f1-0: Invalid argument
672.103605 nm_open [847] NIOCREGIF failed: Invalid argument ens259f1-1

Also, the dmesg says: [357006.217823] 606.558023 [ 376] netmap_ioctl_legacy Minimum supported API is 14 (requested 11)

I was able to run netmap pkt-gen with multiple cores. I have also changed the RSS hash in /netmap/LINUX/i40e-2.4.6/src/ and did make again.

My config file is as below: epwget.txt

Please help me resolve this issue. Thanks, Priyanka

ajamshed commented 5 years ago

Hi @ppnaik1890,

This looks to be a version mismatch issue on my side. I don't have access to an XL710 NIC in my lab here. Can you do me a favor? Please copy (overwrite) netmap/sys/net/netmap.h and netmap/sys/net/netmap_user.h to mtcp/mtcp/src/include/* directory. You will notice that the existing netmap.h in mtcp/src/include/netmap.h has NETMAP_API set to 11:

https://github.com/mtcp-stack/mtcp/blob/1ad1b1a386ad2e17b671c000d08eb1296a94be95/mtcp/src/include/netmap.h#L42

If this patch works, please submit these file changes to the mTCP devel branch as a pull request. I will merge your changes.

ppnaik1890 commented 5 years ago

Hi @ajamshed,

Thanks for your quick response. With the following changes, you suggested I could get mTCP working with 4 cores.

But now I have some performance issue while running epwget example. sudo ./epserver -p /home/turing_05/www -f epserver.conf -N 4

[CPU 0] ens259f1 flows:   4126, RX:   13750(pps) (err:     0),  0.01(Gbps), TX:   10344(pps),  0.01(Gbps)
[CPU 1] ens259f1 flows:   4203, RX:   13921(pps) (err:     0),  0.01(Gbps), TX:   10408(pps),  0.01(Gbps)
[CPU 2] ens259f1 flows:   4051, RX:   12097(pps) (err:     0),  0.01(Gbps), TX:    9045(pps),  0.01(Gbps)
[CPU 3] ens259f1 flows:   4178, RX:   13788(pps) (err:     0),  0.01(Gbps), TX:   10329(pps),  0.01(Gbps)
[ ALL ] ens259f1 flows:  16558, RX:   53556(pps) (err:     0),  0.04(Gbps), TX:   40126(pps),  0.04(Gbps)
[CPU 0] ens259f1 flows:    597, RX:   15095(pps) (err:     0),  0.01(Gbps), TX:    9058(pps),  0.01(Gbps)
[CPU 1] ens259f1 flows:    520, RX:   15356(pps) (err:     0),  0.01(Gbps), TX:    9322(pps),  0.01(Gbps)
[CPU 2] ens259f1 flows:    489, RX:   16572(pps) (err:     0),  0.01(Gbps), TX:   10656(pps),  0.01(Gbps)
[CPU 3] ens259f1 flows:    747, RX:   15352(pps) (err:     0),  0.01(Gbps), TX:    9347(pps),  0.01(Gbps)
[ ALL ] ens259f1 flows:   2353, RX:   62375(pps) (err:     0),  0.05(Gbps), TX:   38383(pps),  0.03(Gbps)
[CPU 0] ens259f1 flows:    643, RX:   11014(pps) (err:     0),  0.01(Gbps), TX:    5507(pps),  0.00(Gbps)
[CPU 1] ens259f1 flows:    574, RX:   11012(pps) (err:     0),  0.01(Gbps), TX:    5506(pps),  0.00(Gbps)
[CPU 2] ens259f1 flows:    529, RX:   11031(pps) (err:     0),  0.01(Gbps), TX:    5541(pps),  0.00(Gbps)
[CPU 3] ens259f1 flows:    676, RX:   10994(pps) (err:     0),  0.01(Gbps), TX:    5497(pps),  0.00(Gbps)
[ ALL ] ens259f1 flows:   2422, RX:   44051(pps) (err:     0),  0.03(Gbps), TX:   22051(pps),  0.02(Gbps)
[CPU 0] ens259f1 flows:      4, RX:     196(pps) (err:     0),  0.00(Gbps), TX:     192(pps),  0.00(Gbps)
[CPU 1] ens259f1 flows:      4, RX:     129(pps) (err:     0),  0.00(Gbps), TX:     125(pps),  0.00(Gbps)
[CPU 2] ens259f1 flows:      0, RX:      16(pps) (err:     0),  0.00(Gbps), TX:      16(pps),  0.00(Gbps)
[CPU 3] ens259f1 flows:      0, RX:     202(pps) (err:     0),  0.00(Gbps), TX:     200(pps),  0.00(Gbps)
[ ALL ] ens259f1 flows:      8, RX:     543(pps) (err:     0),  0.00(Gbps), TX:     533(pps),  0.00(Gbps)
[CPU 0] ens259f1 flows:    511, RX:   10998(pps) (err:     0),  0.01(Gbps), TX:    5499(pps),  0.00(Gbps)
[CPU 1] ens259f1 flows:    512, RX:   10996(pps) (err:     0),  0.01(Gbps), TX:    5498(pps),  0.00(Gbps)
[CPU 2] ens259f1 flows:    513, RX:   10996(pps) (err:     0),  0.01(Gbps), TX:    5498(pps),  0.00(Gbps)
[CPU 3] ens259f1 flows:    516, RX:   10990(pps) (err:     0),  0.01(Gbps), TX:    5495(pps),  0.00(Gbps)

on server 2: sudo ./epwget 169.254.9.84/small.txt 10000000 -c 22000 -f epwget.conf -N 4

[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 2 handles 2500000 flows. connecting to 169.254.9.84:80
CPU 3: initialization finished.
CPU 0: initialization finished.
CPU 1: initialization finished.
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 1 handles 2500000 flows. connecting to 169.254.9.84:80
Learned new arp entry.
ARP Table:
IP addr: 169.254.9.84, dst_hwaddr: 3C:FD:FE:9E:7B:85
---------------------------------------------------------------------------------
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 0 handles 2500000 flows. connecting to 169.254.9.84:80
[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).
Thread 3 handles 2500000 flows. connecting to 169.254.9.84:80
Response size set to 204
[CPU 0] ens259f1 flows:   5500, RX:   18963(pps) (err:     0),  0.02(Gbps), TX:   30720(pps),  0.02(Gbps)
[CPU 1] ens259f1 flows:   5596, RX:   18027(pps) (err:     0),  0.02(Gbps), TX:   30520(pps),  0.02(Gbps)
[CPU 2] ens259f1 flows:   5893, RX:   18058(pps) (err:     0),  0.02(Gbps), TX:   30166(pps),  0.02(Gbps)
[CPU 3] ens259f1 flows:   5500, RX:   18856(pps) (err:     0),  0.02(Gbps), TX:   30841(pps),  0.02(Gbps)
[ ALL ] ens259f1 flows:  22489, RX:   73904(pps) (err:     0),  0.07(Gbps), TX:  122247(pps),  0.10(Gbps)
[CPU 0] ens259f1 flows:   5500, RX:    5478(pps) (err:     0),  0.00(Gbps), TX:   10978(pps),  0.01(Gbps)
[CPU 1] ens259f1 flows:   5596, RX:    5465(pps) (err:     0),  0.00(Gbps), TX:   10965(pps),  0.01(Gbps)
[CPU 2] ens259f1 flows:   5893, RX:    5475(pps) (err:     0),  0.00(Gbps), TX:   10975(pps),  0.01(Gbps)
[CPU 3] ens259f1 flows:   5500, RX:    5515(pps) (err:     0),  0.00(Gbps), TX:   11009(pps),  0.01(Gbps)
[ ALL ] ens259f1 flows:  22489, RX:   21933(pps) (err:     0),  0.02(Gbps), TX:   43927(pps),  0.03(Gbps)
[CPU 0] ens259f1 flows:   5500, RX:     435(pps) (err:     0),  0.00(Gbps), TX:    1191(pps),  0.00(Gbps)
[CPU 1] ens259f1 flows:   5500, RX:     749(pps) (err:     0),  0.00(Gbps), TX:     847(pps),  0.00(Gbps)

I have not run affinity.py file for IRQ pinning. Could you let me know if the issue is because of that or do I need some other configuration changes too? Also can you please help me with affinity.py for i40e (40G NIC). Thanks again for all your help.

ajamshed commented 5 years ago

@ppnaik1890,

I don't think this is an irq affinitization issue (although performance of your experiment may go up slightly if you correctly bind irq #s to cores). What is the file size of small.txt? You may either be hitting PCI lane bottleneck (if your file size --> average packet size is ~ 64B). Or your conifg files (epserver.conf, epwget.conf) files may need some tuning (hint: [WARINING] Available # addresses (16127) is smaller than the max concurrency (16500).).

Also, make sure that the NIC is placed in the first NUMA node (since you are using CPUs 0-3). To understand affinity.py better, please see this link: https://null.53bits.co.uk/index.php?page=numa-and-queue-affinity

ppnaik1890 commented 5 years ago

Hi @ajamshed,

I feel the RSS is not working properly. But could not find the reason for it. Following are the steps we followed to tune/ improve but did not succeed:

Observations:

eunyoung14 commented 5 years ago

Hi @ppnaik1890,

[WARINING] Available # addresses (16127) is smaller than the max concurrency (16500). is saying that the number of available source IP & port pairs is lower than the number of connection you are trying to create concurrently. This may have led mTCP to an unexpected behavior. I recommend you to reduce the concurrency of the client. You can still test larger server-side concurrency by increasing the number of clients. Client-side mTCP does not use all of 65535 port space to support symmetric RSS.