Open andreasgerstmayr opened 4 years ago
fyi, similar situation currently with latest PCP from master
$ pmseries -a $(pmseries disk.dev.read)
1da966685fbfa7b61f9e44c0a7c3e0fed6a387f4
PMID: 60.0.4
Data Type: 64-bit unsigned int InDom: 60.1 0xf000001
Semantics: counter Units: count
Source: e8d3bc6b62ea77a67278009a8ad5cc44d162b7a8
Metric: disk.dev.read
inst [0 or "nvme0n1"] series d566584c9425cf8db1fff9fd431df425ca3ab7f5
inst [1 or "sda"] series cf7fec1925ede13041bff286d2e100e67eced184
inst [0 or "nvme0n1"] labels {"agent":"linux","device_type":"block","domainname":"localdomain","groupid":1001,"hostname":"agerstmayr-thinkpad","indom_name":"per disk","machineid":"18b2c288e7c54055bf296618861c6dc5","userid":1001}
inst [1 or "sda"] labels {"agent":"linux","device_type":"block","domainname":"localdomain","groupid":1001,"hostname":"agerstmayr-thinkpad","indom_name":"per disk","machineid":"18b2c288e7c54055bf296618861c6dc5","userid":1001}
f87250c4ea0e5eca8ff2ca3b3044ba1a6c91a3d9
PMID: 60.0.4
Data Type: 64-bit unsigned int InDom: 60.1 0xf000001
Semantics: counter Units: count
Source: 2914f38f7bdcb7fb3ac0b822c98019248fd541fb
Metric: disk.dev.read
inst [0 or "nvme0n1"] series 7f3afb6f41e53792b18e52bcec26fdfa2899fa58
inst [1 or "sda"] series 0aeab8b239522ab0640577ed788cc601fc640266
inst [0 or "nvme0n1"] labels {"agent":"linux","device_type":"block","domainname":"localdomain","groupid":976,"hostname":"agerstmayr-thinkpad","indom_name":"per disk","machineid":"6dabb302d60b402dabcc13dc4fd0fab8","userid":978}
inst [1 or "sda"] labels {"agent":"linux","device_type":"block","domainname":"localdomain","groupid":976,"hostname":"agerstmayr-thinkpad","indom_name":"per disk","machineid":"6dabb302d60b402dabcc13dc4fd0fab8","userid":978}
btw, no idea how the userid 1001 appeared here (that's my "pcptestuser" to test authentication, I can't remember that this user started the pmcd or pmlogger daemon)
$ pmseries -a $(pmseries 'disk.dev.read{hostname=="agerstmayr-thinkpad"}')
1df87250c4ea0e5eca8ff2ca3b3044ba1a6c91a3
PMID: PM_ID_NULL
Data Type: ??? InDom: unknown 0xffffffff
Semantics: unknown Units: unknown
Source: unknown
d97250c4ea0e5eca8ff2ca3b3044ba1a6c91a3d9
PMID: PM_ID_NULL
Data Type: ??? InDom: unknown 0xffffffff
Semantics: unknown Units: unknown
Source: unknown´
Did the recent changes to the pmseries lang break the filtering?
@andreasgerstmayr looks like your 'machineid' label has changed?! The userid/groupid change should not cause an issue - that label is tagged as "optional" and so not used in hash calculations - but the machineid label change is probably a part of the problem.
FWIW, I'm not seeing any issues here. We also have QA tests that verify hash calculation consistency and many other aspects of pmseries, to the best of my knowledge the recent language changes are not causing your issue here.
I do wonder whether at some point we're going to need a Redis key cleaning/checking tool that can go in and looks for disconnected keys, series missing labels, and so on. Hmm, big job that one.
@andreasgerstmayr looks like your 'machineid' label has changed?! The userid/groupid change should not cause an issue - that label is tagged as "optional" and so not used in hash calculations - but the machineid label change is probably a part of the problem.
Jan pointed out that the pcp user inside the pcp-container also has UID 1001 - probably I ran the container with --network=host
and the pmlogger inside the container wrote to the redis database of the host. So this mystery is solved, works as expected, I got confused because UID 1001 is pcptestuser on my local system, didn't think of the container user which has the same UID.
FWIW, I'm not seeing any issues here. We also have QA tests that verify hash calculation consistency and many other aspects of pmseries, to the best of my knowledge the recent language changes are not causing your issue here.
The main issue of this bug is: In both cases I can see all series, but when I filter them I get wrong series.
The first example shows 4 series, and all of them have the agent: linux
label. However, filtering returns only two valid series. That's definitely an issue, if I run a pmseries query which should match all of them, but only returns 2 out of 4 valid series. The other two are apparently invalid, have no metadata attached, but they should have the same metadata.
Some more context:
[vagrant@collector ~]$ pmseries 'disk.dev.write_bytes[count:1]'
270585f657a2d61504103f2d093ff0cb23342fc6
[Thu Oct 8 08:35:49.488355000 2020] 2216138 4a85e858fceb6477acde2b77e4f702f5977260b9
353820b21b1553ee00159ab6ed44a76a04348b04
[Thu Oct 8 08:35:46.461776000 2020] 3090841 9a5b93a8e679787809501b82e4c3df9848f714d8
710e5b8c629d53ee8bcd96b570bab679fb9481c2
[Thu Oct 8 08:35:52.668167000 2020] 2238512 f4d23dbc843d88ca38094d0ab4f8dbdfa99283c6
97ff84b72e097e02e2f78dfdd3dd4a9585f8269c
[Thu Oct 8 08:36:34.493999000 2020] 576251240 a263678c1545089a08a3ab8ae09dd27d42fd9211
Shows that all series indeed have values.
However, filtering based on the agent, which should match all series:
[vagrant@collector ~]$ pmseries 'disk.dev.write_bytes{agent=="linux"}[count:1]'
710e5b8c629d53ee8bcd96b570bab679fb9481c2
[Thu Oct 8 08:35:52.668167000 2020] 2238512 f4d23dbc843d88ca38094d0ab4f8dbdfa99283c6
97ff84b72e097e02e2f78dfdd3dd4a9585f8269c
[Thu Oct 8 08:35:54.466130000 2020] 576016286 a263678c1545089a08a3ab8ae09dd27d42fd9211
Only returns two series.
All nodes1-3 have exactly the same configuration (https://github.com/andreasgerstmayr/pcp-and-grafana-demo).
Setup: 3 nodes (running pmcd) + 1 collector node (running pmlogger + pmproxy + redis, pmlogger connects to the other nodes)
All running PCP 5.1.1-3
Filtering based on the
agent
label, I get the following series:All nodeX VMs have the same configuration (except the hostname). Any idea why the filtering works for
node3.local
but not fornode1.local
andnode2.local
?