fetchgroup rate conversion issues

myllynen commented 6 years ago

When running iozone -azcR -U /srv -f /srv/testfile -b output.xls on RHEL 7 server/client using either NFSv3 or NFSv4, sar -n NFS reports meaningful numbers but PCP metrics are inconsistent:

[root@client ~]# pmrep nfsclient.bytes.read
  n.b.r.direct  n.b.r.server  n.b.r.normal
          /srv          /srv          /srv
        byte/s        byte/s        byte/s
           N/A           N/A           N/A
         0.000         0.000         0.000
         0.000         0.000         0.000
         0.000  19808313.291  13553056.462
         0.000  -19816450.40  -13558623.95
         0.000         0.000         0.000
         0.000         0.000         0.000
         0.000         0.000         0.000

[root@client ~]# pmrep nfsclient.ops.read
  n.o.r.bytes_sent  n.o.r.rtt  n.o.r.ntrans  n.o.r.bytes_recv  n.o.r.ops  n.o.r.timeouts  n.o.r.execute  n.o.r.queue
              /srv       /srv          /srv              /srv       /srv            /srv           /srv         /srv
            byte/s       ms/s       count/s            byte/s    count/s         count/s           ms/s         ms/s
               N/A        N/A           N/A               N/A        N/A             N/A            N/A          N/A
             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
         36345.631    407.641       245.579      32219911.154    245.579           0.000        420.566        2.983
         91762.244    875.374       620.015      81345989.327    620.015           0.000        907.170        8.943
         89128.009    894.381       602.216      79010775.299    602.216           0.000        923.200        6.956
         81969.675    845.192       553.849      72665008.765    553.849           0.000        875.022        8.949
         77983.330    829.145       526.914      69131167.881    526.914           0.000        858.970        8.948
         76084.525    829.297       514.085      67447902.942    514.085           0.000        865.094        9.944
         77438.659    830.609       523.234      68648324.954    523.234           0.000        861.446        9.947
             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
       -446301.084  -4662.470     -3015.548    -395639879.577  -3015.548           0.000      -4833.424      -50.690
        100200.026    900.715       677.027      88825969.213    677.027           0.000        934.516        6.959
        102779.309    900.109       694.455      91112468.891    694.455           0.000        934.881        6.954
        103964.907    904.164       702.466      92163484.706    702.466           0.000        937.946        6.955
         90489.550    919.609       611.416      80217763.471    611.416           0.000        948.440        5.965
         88877.753    919.679       600.525      78788927.267    600.525           0.000        947.518        5.965
       -540102.527  -5130.334     -3649.341    -478403014.829  -3649.341           0.000      -5309.126      -36.752
         93797.899   2315.544       633.770     506309980.878    633.770           0.000       3298.979      296.024
       -123916.166  -2573.393      -837.271    -533329773.688   -837.271           0.000      -3565.604     -297.961
             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000

It should be possible to mimic sar -n NFS and sar -n NFSD (and probably also nfsiostat) with pmrep(1) but due to above that seems currently hard to do. More analysis would be needed why we occasionally see negative numbers and are all the metrics needed for sar/nfsiostat available. Thanks.

fche commented 6 years ago

What kind of workload was this? Is there concurrent pcp & sar data available? Could this be type-conversion problem within the nfsclient perl pmda? (Counter metric wraparound would show as negative values, but is unlikely due to the U64 pmDesc type.)

myllynen commented 6 years ago

Workload was as stated, iozone run on RHEL 7 NFS client against RHEL 7 server, both VMs otherwise completely idle, no other NFS servers / clients in play. So I think this should be trivially reproducible.

Here are two sets of sar and pmrep outputs shortly after starting the iozone run (NB. nfsclient.bytes.read and sar -n NFS are obviously not the same metrics but illustrate how sar is reporting meaningful numbers while pmrep reports something bogus - I'm still not 100% sure what metric would match e.g. read/s of sar output but nfsclient.ops.read.ops might be it):

[root@client ~]# sar -n NFS 1
Linux 3.10.0-693.17.1.el7.x86_64 (infra02.test.example.com)     02/21/18    _x86_64_    (4 CPU)

01:15:27       call/s retrans/s    read/s   write/s  access/s  getatt/s
...
01:16:00       651.00      0.00    583.00      3.00     14.00     14.00
01:16:01       143.00      0.00     49.00      8.00     16.00     20.00
01:16:02       393.00      0.00    313.00      5.00     16.00     17.00
01:16:03       148.00      0.00     52.00      6.00     16.00     21.00
01:16:04       238.00      0.00    145.00      7.00     18.00     19.00
01:16:05       183.00      0.00     90.00      7.00     18.00     19.00
01:16:06       120.00      0.00     17.00      8.00     18.00     22.00
01:16:07       158.00      0.00     50.00      7.00     20.00     23.00
01:16:08       131.00      0.00     29.00      7.00     20.00     21.00
01:16:09       114.00      0.00     11.00      8.00     18.00     22.00
01:16:10       126.00      0.00     13.00      9.00     20.00     24.00

vs

[root@client ~]# pmrep -p nfsclient.bytes.read
          n.b.r.direct  n.b.r.server  n.b.r.normal
                  /srv          /srv          /srv
                byte/s        byte/s        byte/s
01:16:00           N/A           N/A           N/A
01:16:01         0.000    195252.169    195252.169
01:16:02         0.000         0.000         0.000
01:16:03         0.000   1889514.902   1889514.902
01:16:04         0.000  -2082228.667  -2082228.667
01:16:05         0.000         0.000         0.000
01:16:06         0.000         0.000         0.000
01:16:07         0.000         0.000         0.000
01:16:08         0.000         0.000         0.000
01:16:09         0.000   1042382.165   1042382.165
01:16:10         0.000  -1042652.690  -1042652.690

And also:

[root@client ~]# sar -n NFS 1
Linux 3.10.0-693.17.1.el7.x86_64 (infra02.test.example.com)     02/21/18    _x86_64_    (4 CPU)

01:26:14       call/s retrans/s    read/s   write/s  access/s  getatt/s
...
01:26:30       183.00      0.00     69.00      5.00     22.00     26.00
01:26:31       260.00      0.00    146.00      5.00     22.00     26.00
01:26:32       212.00      0.00     98.00      4.00     22.00     25.00
01:26:33       162.00      0.00     48.00      6.00     24.00     26.00
01:26:34       144.00      0.00     32.00      4.00     22.00     25.00
01:26:35       131.00      0.00      8.00      5.00     24.00     28.00
01:26:36       127.00      0.00     13.00      5.00     22.00     26.00
01:26:37       132.00      0.00      8.00      6.00     24.00     28.00
01:26:38       112.00      0.00      7.00      4.00     22.00     24.00
01:26:39       419.00      0.00    327.00      3.00     18.00     20.00
01:26:40       385.00      0.00    298.00      5.00     16.00     20.00

vs

[root@client ~]# pmrep -p nfsclient.ops.read
          n.o.r.bytes_sent  n.o.r.rtt  n.o.r.ntrans  n.o.r.bytes_recv  n.o.r.ops  n.o.r.timeouts  n.o.r.execute  n.o.r.queue
                      /srv       /srv          /srv              /srv       /srv            /srv           /srv         /srv
                    byte/s       ms/s       count/s            byte/s    count/s         count/s           ms/s         ms/s
01:26:30               N/A        N/A           N/A               N/A        N/A             N/A            N/A          N/A
01:26:31             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:32           441.428     18.890         2.983        521632.448      2.983           0.000         18.890        0.000
01:26:33             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:34             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:35          -441.384    -18.888        -2.982       -521580.592     -2.982           0.000        -18.888        0.000
01:26:36             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:37             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:38           147.138     10.936         0.994        521360.826      0.994           0.000         10.936        0.000
01:26:39             0.000      0.000         0.000             0.000      0.000           0.000          0.000        0.000
01:26:40          -147.066    -10.931        -0.994       -521107.489     -0.994           0.000        -10.931        0.000

Thanks.

myllynen commented 6 years ago

Actually, I think I know what's going on here - we were using -U with iozone which is described as:

       -U mountpoint
              Mount  point  to unmount and remount between tests.  Iozone will
              unmount and remount this mount point before beginning each test.
              This  guarantees  that  the buffer cache does not contain any of
              the file under test.

This confuses nfsclient PMDA, without -U I get consistent results. This kind of constant u/mounting is of course very atypical so perhaps this could be considered a non-issue but strictly speaking such negative numbers should be avoided even in corner cases so I will leave this issue open for further consideration. Thanks.

myllynen commented 6 years ago

I think I've identified all the available / needed metrics, the few ones missing can be discussed separately in #447, here we can concentrate on the negative values corner case mentioned in the previous comment.

natoscott commented 6 years ago

@myllynen I've audited a number of the NFS metrics now, and I believe the descriptors are all correct and match with the kernel code - these metrics are all unsigned 32 bit values.

I suspect the negative values are due to your -U observation, and a lack of handling of "counter wrap" in the fetchgroup code which does ...

        delta = (v.d - prev_v.d) / deltaT;

this lacks the $PCP_COUNTER_WRAP logic (see PCPIntro(1) and interp.c in libpcp) and will produce negative rates for unsigned counters when "prev_v.d" is greater than "v.d".

natoscott commented 6 years ago

@myllynen this patch provides $PCP_COUNTER_WRAP in the fetchgroup code for optional use here https://gist.github.com/natoscott/9701c08d8b038608ed0209e74cca30ba

I should point out that while using this variable will mean you won't see negative values, since this is not a true counter wrap the values will still be incorrect in this fake wrap (reset counters) situation.

fche commented 6 years ago

Looks good, except the PM_*LOCK(__pmLock_extcall) parts now cause the code to cost one brand-new lock/unlock operation for every instance, for metric, for every fetch. It would be better to do that getenv once at startup.

myllynen commented 6 years ago

Testing the patch on a VM where iozone -azc -U /srv-mount -f /srv-mount/testfile is running I see:

root@localhost:~# PCP_COUNTER_WRAP=1 pmrep -w 20 nfsclient.bytes.read
          n.b.r.server          n.b.r.normal          n.b.r.direct
            /srv-mount            /srv-mount            /srv-mount
                byte/s                byte/s                byte/s
                   N/A                   N/A                   N/A
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000
             65472.950             65472.950                 0.000
  18444364750656651264  18444364750656651264                 0.000
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000
             65647.141             98470.711                 0.000
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000
  18409630259107125248  18409630259107092480                 0.000
                 0.000                 0.000                 0.000

fche commented 6 years ago

Hm, I can't say that having such ginormous numbers is much of an improvement. They don't represent anything very meaningful. It's almost as if a missing-value would be best.

kmcdonell commented 6 years ago

The real problem here (aside from the missing counter wrap issue that Nathan addressed) is that the underlying data semantics break one of the PCP data model assumptions. If a metric counts for a while, then gets reset to zero, then counts for a while it is NOT a PCP counter.

If $PCP_COUNTER_WRAP is not set, then perhaps the best that the fetchgroup abstraction can do is return no value (available) for the samples when the "counter" goes backwards.

fche commented 6 years ago

Could one make the case that the pmda should do this instead (or perhaps in addition)?

natoscott commented 6 years ago

@myllynen thanks for testing it out.

@kmcdonell I'll push the initial change adding $PCP_COUNTER_WRAP as is done in the rest of PCP - but, I agree - reporting no value available for negative unsigned counter rates could be a good option here. We should be consistent, and handle PCP_COUNTER_WRAP the same for all tools.

natoscott commented 6 years ago

@kmcdonell my last statement is not quite right - discovered as soon as I started doing QA 8^) - "No values" is the current behaviour for other PCP tools, I misread the code. I've tweaked my patch, added QA & will commit shortly - @myllynen could you double check it for me when you get some time, please? Thanks!

myllynen commented 6 years ago

I now tested with a simple stanza of rm -f /tmp/foo ; dd if=/srv-mount/foo of=/tmp/foo ; umount /srv-mount ; mount -a ; dd if=/srv-mount/foo of=/tmp/foo ; umount /srv-mount ; mount -a ; a few times in a row to make things a bit more readable:

root@localhost:~# pmrep -w 20 nfsclient.bytes.read
          n.b.r.server          n.b.r.direct          n.b.r.normal
            /srv-mount            /srv-mount            /srv-mount
                byte/s                byte/s                byte/s
                   N/A                   N/A                   N/A
         277701820.786                 0.000         278461670.985
         270608422.283                 0.000         270601273.057
                   N/A                 0.000                   N/A
         267283572.484                 0.000         260850142.851
         276441115.233                 0.000         278808080.869
         278019156.095                 0.000         275980321.629
                 0.000                 0.000                 0.000
                   N/A                 0.000                   N/A
                 0.000                 0.000                 0.000

root@localhost:~# PCP_COUNTER_WRAP=1 pmrep -w 20 nfsclient.bytes.read
          n.b.r.server          n.b.r.direct          n.b.r.normal
            /srv-mount            /srv-mount            /srv-mount
                byte/s                byte/s                byte/s
                   N/A                   N/A                   N/A
         275832549.065                 0.000         273655181.201
         279384796.438                 0.000         279285909.140
                 0.000                 0.000                 0.000
  18405754457922273280                 0.000  18405754457924616192
         259107469.833                 0.000         258518078.039
         283583716.716                 0.000         282463146.655
         271519149.613                 0.000         271022655.697
  18420881155560431616                 0.000  18420881155567261696
         204677660.629                 0.000         197564763.552
         266182807.176                 0.000         266818737.401
         276791723.843                 0.000         278269456.124
         267004266.832                 0.000         266327610.775
  18448422879322562560                 0.000  18448422879322898432
                 0.000                 0.000                 0.000
                 0.000                 0.000                 0.000

natoscott commented 6 years ago

@myllynen good to see, thanks for the testing

performancecopilot / pcp

fetchgroup rate conversion issues #446