snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

Long NIC initialization for X540-AT2 #613

Open pavel-odintsov opened 9 years ago

pavel-odintsov commented 9 years ago

Hello, folks!

I'm using FireHose app for traffic processing.

And I have two NIC models:

lspci |grep Eth
03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
04:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

And I'm using following code for tests:

cd /usr/src
git clone https://github.com/FastVPSEestiOu/fastnetmon.git
cd fastnetmon/src/tests/snabb
g++ -O3 ../../fastnetmon_packet_parser.c  -c -o fastnetmon_packet_parser.o -fPIC
g++ -O3 -shared -o capturecallback.so -fPIC capturecallback.cpp fastnetmon_packet_parser.o

My current SnabbSwitch branch is "next". I have checked "master" branch too.

And when I have specified X540-AT2 NIC in --input I have really HUGE time for NIC "Initialization":

time /usr/src/snabbswitch/src/snabb firehose  --input 0000:04:00.0 --input 0000:04:00.1 /usr/src/fastnetmon/src/tests/snabb/capturecallback.so
Loading shared object: /usr/src/fastnetmon/src/tests/snabb/capturecallback.so
Initializing NIC: 0000:04:00.0
!!! 10 seconds lag here !!!
Initializing NIC: 0000:04:00.1
!!! And 10 seconds lag here !!!
Initializing callback library
Processing traffic...
^CWe caught SINGINT and will finish application

real    0m32.881s
user    0m1.276s
sys     0m0.980s

But when I have switched to 82599 NIC for --input everything goes really fast:

time /usr/src/snabbswitch/src/snabb firehose --input 0000:03:00.0 --input 0000:03:00.1  /usr/src/fastnetmon/src/tests/snabb/capturecallback.so
Loading shared object: /usr/src/fastnetmon/src/tests/snabb/capturecallback.so
Initializing NIC: 0000:03:00.0
Initializing NIC: 0000:03:00.1
Initializing callback library
Processing traffic...
^CWe caught SINGINT and will finish application

real    0m2.184s
user    0m0.772s
sys     0m0.020s

Do you have any ideas what wrong with X540-AT2?

pavel-odintsov commented 9 years ago

So I looked on strace and found so much nanosleep calls:

nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
nanosleep({0, 100000}, NULL)            = 0
lukego commented 9 years ago

There does seem to be a long delay before these NICs reach the "link up" state. I don't know why exactly but we made our selftest routine accept this lag.

Can you start actually passing traffic more quickly with the kernel driver? Could be that we can accelerate the init somehow.

pavel-odintsov commented 9 years ago

Hello!

Thanks for answer! Will add this cards to blacklist! :)

So I could test it with netmap and share results.