Closed myllynen closed 6 years ago
What kind of workload was this? Is there concurrent pcp & sar data available? Could this be type-conversion problem within the nfsclient perl pmda? (Counter metric wraparound would show as negative values, but is unlikely due to the U64 pmDesc type.)
Workload was as stated, iozone run on RHEL 7 NFS client against RHEL 7 server, both VMs otherwise completely idle, no other NFS servers / clients in play. So I think this should be trivially reproducible.
Here are two sets of sar and pmrep outputs shortly after starting the iozone run (NB. nfsclient.bytes.read
and sar -n NFS
are obviously not the same metrics but illustrate how sar is reporting meaningful numbers while pmrep reports something bogus - I'm still not 100% sure what metric would match e.g. read/s of sar output but nfsclient.ops.read.ops might be it):
[root@client ~]# sar -n NFS 1
Linux 3.10.0-693.17.1.el7.x86_64 (infra02.test.example.com) 02/21/18 _x86_64_ (4 CPU)
01:15:27 call/s retrans/s read/s write/s access/s getatt/s
...
01:16:00 651.00 0.00 583.00 3.00 14.00 14.00
01:16:01 143.00 0.00 49.00 8.00 16.00 20.00
01:16:02 393.00 0.00 313.00 5.00 16.00 17.00
01:16:03 148.00 0.00 52.00 6.00 16.00 21.00
01:16:04 238.00 0.00 145.00 7.00 18.00 19.00
01:16:05 183.00 0.00 90.00 7.00 18.00 19.00
01:16:06 120.00 0.00 17.00 8.00 18.00 22.00
01:16:07 158.00 0.00 50.00 7.00 20.00 23.00
01:16:08 131.00 0.00 29.00 7.00 20.00 21.00
01:16:09 114.00 0.00 11.00 8.00 18.00 22.00
01:16:10 126.00 0.00 13.00 9.00 20.00 24.00
vs
[root@client ~]# pmrep -p nfsclient.bytes.read
n.b.r.direct n.b.r.server n.b.r.normal
/srv /srv /srv
byte/s byte/s byte/s
01:16:00 N/A N/A N/A
01:16:01 0.000 195252.169 195252.169
01:16:02 0.000 0.000 0.000
01:16:03 0.000 1889514.902 1889514.902
01:16:04 0.000 -2082228.667 -2082228.667
01:16:05 0.000 0.000 0.000
01:16:06 0.000 0.000 0.000
01:16:07 0.000 0.000 0.000
01:16:08 0.000 0.000 0.000
01:16:09 0.000 1042382.165 1042382.165
01:16:10 0.000 -1042652.690 -1042652.690
And also:
[root@client ~]# sar -n NFS 1
Linux 3.10.0-693.17.1.el7.x86_64 (infra02.test.example.com) 02/21/18 _x86_64_ (4 CPU)
01:26:14 call/s retrans/s read/s write/s access/s getatt/s
...
01:26:30 183.00 0.00 69.00 5.00 22.00 26.00
01:26:31 260.00 0.00 146.00 5.00 22.00 26.00
01:26:32 212.00 0.00 98.00 4.00 22.00 25.00
01:26:33 162.00 0.00 48.00 6.00 24.00 26.00
01:26:34 144.00 0.00 32.00 4.00 22.00 25.00
01:26:35 131.00 0.00 8.00 5.00 24.00 28.00
01:26:36 127.00 0.00 13.00 5.00 22.00 26.00
01:26:37 132.00 0.00 8.00 6.00 24.00 28.00
01:26:38 112.00 0.00 7.00 4.00 22.00 24.00
01:26:39 419.00 0.00 327.00 3.00 18.00 20.00
01:26:40 385.00 0.00 298.00 5.00 16.00 20.00
vs
[root@client ~]# pmrep -p nfsclient.ops.read
n.o.r.bytes_sent n.o.r.rtt n.o.r.ntrans n.o.r.bytes_recv n.o.r.ops n.o.r.timeouts n.o.r.execute n.o.r.queue
/srv /srv /srv /srv /srv /srv /srv /srv
byte/s ms/s count/s byte/s count/s count/s ms/s ms/s
01:26:30 N/A N/A N/A N/A N/A N/A N/A N/A
01:26:31 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:32 441.428 18.890 2.983 521632.448 2.983 0.000 18.890 0.000
01:26:33 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:34 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:35 -441.384 -18.888 -2.982 -521580.592 -2.982 0.000 -18.888 0.000
01:26:36 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:37 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:38 147.138 10.936 0.994 521360.826 0.994 0.000 10.936 0.000
01:26:39 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
01:26:40 -147.066 -10.931 -0.994 -521107.489 -0.994 0.000 -10.931 0.000
Thanks.
Actually, I think I know what's going on here - we were using -U
with iozone
which is described as:
-U mountpoint
Mount point to unmount and remount between tests. Iozone will
unmount and remount this mount point before beginning each test.
This guarantees that the buffer cache does not contain any of
the file under test.
This confuses nfsclient PMDA, without -U
I get consistent results. This kind of constant u/mounting is of course very atypical so perhaps this could be considered a non-issue but strictly speaking such negative numbers should be avoided even in corner cases so I will leave this issue open for further consideration. Thanks.
I think I've identified all the available / needed metrics, the few ones missing can be discussed separately in #447, here we can concentrate on the negative values corner case mentioned in the previous comment.
@myllynen I've audited a number of the NFS metrics now, and I believe the descriptors are all correct and match with the kernel code - these metrics are all unsigned 32 bit values.
I suspect the negative values are due to your -U observation, and a lack of handling of "counter wrap" in the fetchgroup code which does ...
delta = (v.d - prev_v.d) / deltaT;
this lacks the $PCP_COUNTER_WRAP logic (see PCPIntro(1) and interp.c in libpcp) and will produce negative rates for unsigned counters when "prev_v.d" is greater than "v.d".
@myllynen this patch provides $PCP_COUNTER_WRAP in the fetchgroup code for optional use here https://gist.github.com/natoscott/9701c08d8b038608ed0209e74cca30ba
I should point out that while using this variable will mean you won't see negative values, since this is not a true counter wrap the values will still be incorrect in this fake wrap (reset counters) situation.
Looks good, except the PM_*LOCK(__pmLock_extcall) parts now cause the code to cost one brand-new lock/unlock operation for every instance, for metric, for every fetch. It would be better to do that getenv once at startup.
Testing the patch on a VM where iozone -azc -U /srv-mount -f /srv-mount/testfile
is running I see:
root@localhost:~# PCP_COUNTER_WRAP=1 pmrep -w 20 nfsclient.bytes.read
n.b.r.server n.b.r.normal n.b.r.direct
/srv-mount /srv-mount /srv-mount
byte/s byte/s byte/s
N/A N/A N/A
0.000 0.000 0.000
0.000 0.000 0.000
65472.950 65472.950 0.000
18444364750656651264 18444364750656651264 0.000
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000
65647.141 98470.711 0.000
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000
18409630259107125248 18409630259107092480 0.000
0.000 0.000 0.000
Hm, I can't say that having such ginormous numbers is much of an improvement. They don't represent anything very meaningful. It's almost as if a missing-value would be best.
The real problem here (aside from the missing counter wrap issue that Nathan addressed) is that the underlying data semantics break one of the PCP data model assumptions. If a metric counts for a while, then gets reset to zero, then counts for a while it is NOT a PCP counter.
If $PCP_COUNTER_WRAP is not set, then perhaps the best that the fetchgroup abstraction can do is return no value (available) for the samples when the "counter" goes backwards.
Could one make the case that the pmda should do this instead (or perhaps in addition)?
@myllynen thanks for testing it out.
@kmcdonell I'll push the initial change adding $PCP_COUNTER_WRAP as is done in the rest of PCP - but, I agree - reporting no value available for negative unsigned counter rates could be a good option here. We should be consistent, and handle PCP_COUNTER_WRAP the same for all tools.
@kmcdonell my last statement is not quite right - discovered as soon as I started doing QA 8^) - "No values" is the current behaviour for other PCP tools, I misread the code. I've tweaked my patch, added QA & will commit shortly - @myllynen could you double check it for me when you get some time, please? Thanks!
I now tested with a simple stanza of rm -f /tmp/foo ; dd if=/srv-mount/foo of=/tmp/foo ; umount /srv-mount ; mount -a ; dd if=/srv-mount/foo of=/tmp/foo ; umount /srv-mount ; mount -a ;
a few times in a row to make things a bit more readable:
root@localhost:~# pmrep -w 20 nfsclient.bytes.read
n.b.r.server n.b.r.direct n.b.r.normal
/srv-mount /srv-mount /srv-mount
byte/s byte/s byte/s
N/A N/A N/A
277701820.786 0.000 278461670.985
270608422.283 0.000 270601273.057
N/A 0.000 N/A
267283572.484 0.000 260850142.851
276441115.233 0.000 278808080.869
278019156.095 0.000 275980321.629
0.000 0.000 0.000
N/A 0.000 N/A
0.000 0.000 0.000
root@localhost:~# PCP_COUNTER_WRAP=1 pmrep -w 20 nfsclient.bytes.read
n.b.r.server n.b.r.direct n.b.r.normal
/srv-mount /srv-mount /srv-mount
byte/s byte/s byte/s
N/A N/A N/A
275832549.065 0.000 273655181.201
279384796.438 0.000 279285909.140
0.000 0.000 0.000
18405754457922273280 0.000 18405754457924616192
259107469.833 0.000 258518078.039
283583716.716 0.000 282463146.655
271519149.613 0.000 271022655.697
18420881155560431616 0.000 18420881155567261696
204677660.629 0.000 197564763.552
266182807.176 0.000 266818737.401
276791723.843 0.000 278269456.124
267004266.832 0.000 266327610.775
18448422879322562560 0.000 18448422879322898432
0.000 0.000 0.000
0.000 0.000 0.000
@myllynen good to see, thanks for the testing
When running
iozone -azcR -U /srv -f /srv/testfile -b output.xls
on RHEL 7 server/client using either NFSv3 or NFSv4,sar -n NFS
reports meaningful numbers but PCP metrics are inconsistent:It should be possible to mimic
sar -n NFS
andsar -n NFSD
(and probably alsonfsiostat
) with pmrep(1) but due to above that seems currently hard to do. More analysis would be needed why we occasionally see negative numbers and are all the metrics needed for sar/nfsiostat available. Thanks.