mtcp-stack / mtcp

mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems
Other
1.98k stars 436 forks source link

netmap RSS support #198

Closed wtao0221 closed 6 years ago

wtao0221 commented 6 years ago

Hi,

I build mtcp with the latest version of netmap which uses the ixgbe-5.3.7 driver.

But the test results show nearly zero throughput. I think the problem here is the RSS.

So how I correctly setup the RSS in the ixgbe-5.3.7 driver?

I do not modify seeds as suggested by the readme.netmap, since the function ixgbe_setup_mrqc() in 5.3.7 driver may be not a proper place to do it. Instead, I write the seeds in function ixgbe_init_rss_key().

ajamshed commented 6 years ago

@wtao0221:

Did you first try running pkt-gen rx (version) and pktgen tx (version) to test whether your netmap driver is working fine?

I would then suggest that run your mTCP-based server application against a normal Linux-based TCP client. This setup should work even if the RSS seed is incorrectly set. If this works, we can then check as to why mTCP-based client is not working.

wtao0221 commented 6 years ago

Hi, @ajamshed

Yes, the netmap driver works fine. It achieves 14Mpps in my 10-GbE testbed.

I just write a simple mTCP-based server and Linux-based client to do message-echo. I use 2 cores for the mTCP-based server and launch 10 Linux-based client TCP flows. 3 flows are scheduled to CPU 0 and other 7 flows are scheduled to CPU 1. It seems this works fine, since the printed pps stats of CPU 0 and CPU 1 shows the right relative-ratio (i.e., 3/7 in this case).

But it seems that the mTCP-based server may not always accept all the 10 flows. Sometimes in this test, it only accepts 8 out of 10 flows.

For the mTCP-based epserver and epwget, I test the -N 1 scenario for both sides at the same time and this case works correctly.

wtao0221 commented 6 years ago

Hi, @ajamshed

When I use -N 2 on both sides, the CPU 0 of client come to receive the packets or flows that should be received by CPU 1 of the client, which is checked in the logs. And the flows sent from client CPU 0 enters the queue 1 which is processed by the server CPU 1.

In summary, flows in client CPU 0 --> server CPU 1, and flows in server CPU 1 --> client CPU 1 (this may demonstrate that the underlying NIC's RSS works in the same way.)

So I suspect there may be a mismatch between mTCP RSS logic, which is used to generate flows' source ports, and the underlying NIC RSS logic.

BTW, I use epwget and epserver to test the one-core throughput of mTCP w/ netmap, which shows about 50k rps. This may be littler lower than the Linux stack, which is about 70k rps.

Do you benchmark the multi-queue mTCP w/ netmap?

ajamshed commented 6 years ago

@wtao0221:

Based on your feedback, it looks like the RSS seed is not correctly being set in the client. I tested the netmap driver a couple of years back (when I was using linux-3.13.0 and ixgbe-3.15.1). I will need to test the latest version of netmap on the newest kernel on mTCP. Can you please share the following info?:

Your second question is related to performance on a single core version. Can you please verify that you followed instructions on README.netmap (especially step 5). Also, did you run the affinity-netmap.py script?

wtao0221 commented 6 years ago

Hi, @ajamshed

Thanks.

And for the performance issue, I will further check that.

wtao0221 commented 6 years ago

Hi, @ajamshed

And I use ethtool --show-rxfh-indir $interface to check the RSS key of the NIC:

RSS hash key:
05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05:05

I think that the RSSK are correctly written to the NIC register.

wtao0221 commented 6 years ago

I have done a quick fix, namely always setting endian_type=0 in mtcp/src/rss.c, which I am not sure whether it is correct. But it works under the -N 2 scenario.

But the server side cannot always process all the flows and output the logs below.

[ EPOLL: mtcp_epoll_wait: 518] Socket 3: event OUT invalidated.

So, what's the difference between usr_event_queue and usr_shadow_event_queue? Is that for he performance issue, usr_shadow_event_queue does not hold a lock?

eunyoung14 commented 6 years ago

Hi @wtao0221, Regarding the event queues, please refer to this.

ajamshed commented 6 years ago

Hi @wtao0221:

I have just tested the netmap version of mTCP on a 4-core Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz machine that had a dual-port Intel Corporation 82599ES 10-Gigabit adapter. The netmap driver has updated/improved quite significantly the last time I evaluated it. The installation is much simpler and we don't really need to apply netmap patches manually.

Here is my feedback:

1- The 1 core version of epserver and 4 core version of epserver ran fine. On the other hand, I faced problems (as you reported) when I ran the 2 core or 3 core version of the application. 2- In order to fix the issue, you need to change the number of rx/tx queues which get set up once you insert the ixgbe.ko driver module (By default, ixgbe.ko creates 'n' number of rx/tx queues where n=number of online cpu cores). In my case, if I would run the 3-core version of epserver, I would not get traffic if I would NOT change the number of RX/TX queues of the NIC to 3. Use the following command to change the number of rx/tx queues to 3:

$ sudo ethtool -L <intf_name> combined 3

Please remember to run ./affinity-netmap.py script again after you have re-set rx/tx queues.

Please let me know if you have any other questions. I will update README.netmap file to reflect the changes that need to be made with the new netmap driver. I think the endian check needs to be 0 in GetRSSCPUCore() to get epwget working. I will update the code in my next check-in. Thanks for reporting the issue.

wtao0221 commented 6 years ago

Hi, @ajamshed

Thanks for the feedback. I will further check it.

One minor question. Does this mean that we do not need to set the RSSK seeds in the ixgbe driver under netmap?

ajamshed commented 6 years ago

RSSK needs to be set if you want to use mTCP-based TCP client applications.

wtao0221 commented 6 years ago

Hi, @ajamshed

It works now. Thanks for your help.