snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

Intel 82574L ethernet driver selftest shows unexpected hardware counter values #1

Closed lukego closed 11 years ago

lukego commented 11 years ago

The Intel 82574L device driver's self-test function is showing unexpected counter values. The test attempts to transmit 100,000 packets, and optionally to receive them again with loopback mode. Displaying the hardware counters shows some values that are expected and others that are surprisingly low.

Here are example results when attempting to transmit 100,000 packets of 1000 bytes each:

Statistics for PCI device 0000:00:04.0:
              54,306 GPTC       Good Packets Transmitted Count
          54,306,000 GOTCL      Good Octets Transmitted Count
         100,000,000 TOTL       Total Octets Transmitted (Low)
              54,306 TPT        Total Packets Transmitted 
             100,000 PTC1023    Packets Transmitted [512–1023 Bytes] Count

Why do some counters show 100,000 and others only 54,306?

Here are similar results with MAC loopback mode engaged:

Statistics for PCI device 0000:00:04.0:
             100,000 PRC1023    Packets Received [512–1023 Bytes] Count
              14,618 GPRC       Good Packets Received Count
              14,618 GPTC       Good Packets Transmitted Count
          14,618,000 GORCL      Good Octets Received Count
          14,618,000 GOTCL      Good Octets Transmitted Count
         100,000,000 TORL       Total Octets Received (Low)
         100,000,000 TOTL       Total Octets Transmitted (Low)
             100,000 TPR        Total Packets Received 
              14,618 TPT        Total Packets Transmitted 
             100,000 PTC1023    Packets Transmitted [512–1023 Bytes] Count

Curious that TPR shows 100,000 while TPT, GPRC, GPTC all show less.

Here's how to reproduce the problem:

Here are ideas for how to investigate:

lukego commented 11 years ago

Here are additional things you need to do in order to run snabbswitch and reproduce the problem:

In the likely event that the NIC you want to test with does not have PCI address 0000:00:04.0 then you need to substitute the actual address in the command above, and in src/selftest.lua. Use lspci -v to find the right card.

lukego commented 11 years ago

Here is an update on how to experiment, written in response to the first interested user!

The Snabb Lab: I have two EX6 servers colocated at hetzner.de that each have one spare Intel ethernet port and have a cross-cable connecting them together. The hostnames are arbon.snabb.co and bern.snabb.co. I'm currently developing inside a VM on arbon and occasionally running tcpdump/dstat/ifconfig/etc on bern to check what's being output.

To connect to the development VM (running Ubuntu v12 "cloud image") on arbon use: ssh -p 54322 $user@arbon.snabb.co and to connect to bern use: ssh $user@snabb.co

where $user is an account that I have created for you (just leave a comment here with your ssh public key if you want to have one for trying out the switch).

Given today's selftest workload I feel the NIC is only in use about 1% of the time so this lab setup should be able to scale up to a few people. Here's an important tip to avoid multiple people running snabbswitch at the same time (which would be very confusing): instead of "./snabbswitch" always run "flock -x /tmp/snabb.lock ./snabbswitch". This way only one process will run at a time and any extra ones will automatically block until they have a turn.

Please leave the servers as you found them! no wild global configuration changes etc. do whatever you want in your home directory and feel free to sudo and install software you need etc. Mail luke@snabb.co if something crashes or needs rebooting, no stress :)

lukego commented 11 years ago

rahul@serverstack.info commented by mail that the Intel e1000 driver in Linux is very well-debugged and high-quality code. definitely a good resource for comparing to understand where the snabb switch bug is.

lukego commented 11 years ago

Great!

I see that changes I made to the selftest procedure (now calling selftest2() instead of selftest() in intel.lua) means this Issue doesn't reproduce out of the box. Sorry about that. I will try to make a fix now so that running the switch reproduces the problem again. Update to follow.

Does the code compile and run for you btw?

rahul-mr commented 11 years ago

Hi Luke,

On 01/07/2013 07:56 PM, Luke Gorrie wrote:

I see that changes I made to the selftest procedure (now calling selftest2() instead of selftest() in intel.lua) means this Issue doesn't reproduce out of the box. Sorry about that. I will try to make a fix now so that running the switch reproduces the problem again. Update to follow.

Does the code compile and run for you btw?

I can confirm that the selftest2() procedure seems to be working, but the selftest() procedure is not printing any statistic.

Regards, Rahul

lukego commented 11 years ago

Wow cool that it runs! :-) You are the second person after me to run the switch!

Looks like I have broken selftest() quite a bit with recent hacking. I will now extend selftest2() to also support receive and then we can try to reproduce the problem with that.

Does the code make any sense btw? I am still learning Lua and I think especially the way I'm doing object-oriented programming - lots of "M." prefixes - is a bit clunky and can be better.

rahul-mr commented 11 years ago

On 01/07/2013 08:20 PM, Luke Gorrie wrote:

Wow cool that it runs! :-) You are the second person after me to run the switch! He He 8-) Looks like I have broken selftest() quite a bit with recent hacking. I will now extend selftest2() to also support receive and then we can try to reproduce the problem with that. OK. Does the code make any sense btw? I am still learning Lua and I think especially the way I'm doing object-oriented programming - lots of "M." prefixes - is a bit clunky and can be better.

I haven't played with lua much (I mostly program in python, D). Since Lua wasn't really designed for heavy-duty OOP, I guess it'll always look a bit awkward (but hey, it works!). The code does make sense btw :-)

Looking forward for the updated selftest.

Regards, Rahul

lukego commented 11 years ago

OK! The updated selftest is checked in now with commit b3867caff51e261769822bb6d55de6a37947884d.

Now selftest2() is extended to also handle RX and is renamed to selftest() replacing the old one.

The problem that shows up now is that the transmit+receive+loopback test drops most of the packets. Do you see this too? I don't know why that is but it's a bug that would be good to fix. Welcome to have a look :). Probably best to create a new Issue.

The original problem from this issue doesn't seem to be reproducible now? Could be that it was fixed by changes to the logic that says when descriptor rings are full/empty (I think I fixed stuff there last week), or could be that it still exists and I'm just not seeing it.

btw: another interesting but larger thing to hack on in this source file is the add_txbuf_tso() function that is currently just a stub. The goal is to use the TCP segmentation offload hardware features so we would have a test case transmits really big packets (~64K) and then (by loopback) receives the same data back in more smaller packets. This would be a major step towards implementing STT in the future (possibly being the first open source implementation...)

Dinner time over here! :-)

rahul-mr commented 11 years ago

I'm having a look at the updated selftest. Will report any interesting findings.

Regards, Rahul

rahul-mr commented 11 years ago

OK, Found something interesting:

File: intel.lua ; function init_receive(): Line 252:

regs[RXDCTL] = bits({GRAN=24, WTHRESH0=16})

this line which sets Receiver Descriptor Control (RXDCTL) register was commented out. Un-commenting the line, has drastically cut down the Missed Packets Count, while increasing the Receive No Buffers Count.

BEFORE:

Statistics for PCI device 0000:00:04.0:
       1,109,458 MPC        Missed Packets Count
          80,667 PRC64      Packets Received [64 Bytes] Count
          80,667 GPRC       Good Packets Received Count
       1,190,213 GPTC       Good Packets Transmitted Count
       5,162,688 GORCL      Good Octets Received Count
      76,174,336 GOTCL      Good Octets Transmitted Count
               2 RNBC       Receive No Buffers Count
      76,176,896 TORL       Total Octets Received (Low)
      76,177,408 TOTL       Total Octets Transmitted (Low)
       1,190,279 TPR        Total Packets Received
       1,190,283 TPT        Total Packets Transmitted
       1,190,286 PTC64      Packets Transmitted [64 Bytes] Count

AFTER:

Statistics for PCI device 0000:00:04.0:
         232,479 MPC        Missed Packets Count
         818,720 PRC64      Packets Received [64 Bytes] Count
         818,734 GPRC       Good Packets Received Count
       1,051,232 GPTC       Good Packets Transmitted Count
      52,399,680 GORCL      Good Octets Received Count
      67,279,488 GOTCL      Good Octets Transmitted Count
              24 RNBC       Receive No Buffers Count
      67,281,472 TORL       Total Octets Received (Low)
      67,281,856 TOTL       Total Octets Transmitted (Low)
       1,051,283 TPR        Total Packets Received
       1,051,286 TPT        Total Packets Transmitted
       1,051,289 PTC64      Packets Transmitted [64 Bytes] Count

Maybe the thresholds in RXDCTL register needs adjustment?

lukego commented 11 years ago

Great! Thanks!

So I'm a Github newbie and I'm curious to see how it works. Do you think you could send that fix over as a "Pull request" so that we can test the workflow?

rahul-mr commented 11 years ago

OK, I've sent a pull request: https://github.com/SnabbCo/snabbswitch/pull/31

lukego commented 11 years ago

Great, it worked fine! :-D

lukego commented 11 years ago

Congratulations you are the first contributor of a patch :-)

rahul-mr commented 11 years ago

On 01/09/2013 07:01 PM, Luke Gorrie wrote:

Congratulations you are the first contributor of a patch:-)

Yay! :-)

rahul-mr commented 11 years ago

Luke, do you think this issue should be closed as it is no longer reproducible as originally described?

lukego commented 11 years ago

Yes.