Closed spolack closed 6 months ago
I'm also seeing following log output, but that is possibly unrelated? The HTTP requests never time out, the endpoint always quickly responds.
2023/10/19 21:02:10.130216 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:308: [prometheus_output:prom-output] writing expired after 10s
2023/10/19 21:02:10.130217 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:308: [prometheus_output:prom-output] writing expired after 10s
2023/10/19 21:02:10.130229 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:308: [prometheus_output:prom-output] writing expired after 10s
2023/10/19 21:02:10.130282 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:308: [prometheus_output:prom-output] writing expired after 10s
2023/10/19 21:02:10.130292 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:308: [prometheus_output:prom-output] writing expired after 10s
Reverting to Version 0.32.0 also fixes that issue.
Hi, I'm not sure I get what is the issue you see. The timeout is about writing the metric to the local cache. v0.32.0 didn't have a timeout. If you have the same symptoms you would see high memory usage.
The metrics simply disappear in the prometheus scrape endpount after ~1minute. Few seconds later they are back.
The configuration should be sufficient to reproduce. What i find interesting is that it simply works when enabling export_timestamps
.
Are the routers and the machines where you run gNMIc and prometheus synced with an NTP server? What is the time offset between them?
If the metrics disappear after 60s it might be that the expiration
attribute it responsible.
With export-timestamps: true
, gNMIc uses the timestamp received from the router to mark metrics as expired.
With export-timestamps: false
. gNMIc uses the timestamp at which the metric was added to mark it as expired. In this case prometheus uses the scrape timestamp to store the metric.
Another thing to check is which timestamp the router includes in the subscribeResponses. I know Arista routers have a knob to change the ts send-time
to last-change-time
. If it's set to the latter, you are probably getting some old timestamps for some notifications.
The machines are the routers itself! I assume zero time offset.
I collected some log for a single metric.
bash-4.2# ip netns exec ns-MGMT /mnt/flash/gnmic --config /mnt/flash/gnmic.yml subscribe --debug
2023/10/25 23:38:34.237274 /home/runner/work/gnmic/gnmic/app/app.go:222: [gnmic] version=0.33.0, commit=d8f931e, date=2023-10-08T17:08:01Z, gitURL=https://github.com/openconfig/gnmic, docs=https://gnmic.openconfig.net
[..]
2023/10/25 23:38:34.256913 /home/runner/go/pkg/mod/google.golang.org/grpc@v1.58.2/grpclog/logger.go:53: [gnmic] [core] [Channel #1] Channel Connectivity change to READY
2023/10/25 23:38:34.256924 /home/runner/work/gnmic/gnmic/app/gnmi_client_subscribe.go:219: [gnmic] target "localhost" gNMI client created
2023/10/25 23:38:34.256944 /home/runner/work/gnmic/gnmic/app/gnmi_client_subscribe.go:222: [gnmic] sending gNMI SubscribeRequest: subscribe='subscribe:{subscription:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} mode:SAMPLE sample_interval:15000000000}}', mode='STREAM', encoding='JSON', to localhost
2023/10/25 23:38:34.306576 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277097585598825 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}}
2023/10/25 23:38:34.306717 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:sync_response:true}
2023/10/25 23:38:34.306780 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277097585598825,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}}
2023/10/25 23:38:34.306866 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:38:34.30683334 +0000 UTC m=+0.103545344
2023/10/25 23:38:49.307105 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277127584156213 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:2}}}}
2023/10/25 23:38:49.307234 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277127584156213,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":2}}
2023/10/25 23:38:49.307278 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=2.000000,time=nil,addedAt=2023-10-25 23:38:49.307256663 +0000 UTC m=+15.103968667
2023/10/25 23:39:04.307129 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}}
2023/10/25 23:39:04.307260 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}}
2023/10/25 23:39:04.307299 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:04.307281168 +0000 UTC m=+30.103993162
2023/10/25 23:39:19.306720 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}}
2023/10/25 23:39:19.306842 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}}
2023/10/25 23:39:19.306893 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:19.306866352 +0000 UTC m=+45.103578356
2023/10/25 23:39:34.306827 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}}
2023/10/25 23:39:34.306944 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}}
2023/10/25 23:39:34.307016 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:34.306983488 +0000 UTC m=+60.103695492
<=== METRICS DISAPPEAR
2023/10/25 23:39:49.307001 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277187583141822 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:0}}}}
2023/10/25 23:39:49.307117 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277187583141822,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":0}}
2023/10/25 23:39:49.307156 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=0.000000,time=nil,addedAt=2023-10-25 23:39:49.307138443 +0000 UTC m=+75.103850437
<=== METRICS REAPPEAR
^C
received signal 'interrupt'. terminating...
It seems that its currently only updating the timestamp on change.
Still i see several updates within those 60s:
2023/10/25 23:38:34.306780 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277097585598825,
2023/10/25 23:38:49.307234 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277127584156213,
2023/10/25 23:39:04.307260 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,
2023/10/25 23:39:19.306842 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,
2023/10/25 23:39:34.306944 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,
I just configured it to set the timestamp to send-time
, whereas default seems to be last-change-time
.
Now i see every update coming in with an updated timestamp. Still after 60s the metrics dissapear until next update.
This problem is still occuring with latest release. 0.32.0
is the last working release.
[spolack@r1-lab1-de ~]$ /mnt/flash/gnmic version
version : 0.34.3
commit : d461fa9
date : 2023-11-14T02:13:11Z
gitURL : https://github.com/openconfig/gnmic
docs : https://gnmic.openconfig.net
Can you set expiration: -1s
just to make sure that the metrics are missing because they are marked expired by gNMIc ?
With expiration: -1s
the metrics are continuously returned, but they are not updating.
Can you make sure the new notifications coming from the router have higher timestamps than the previous ones?
Yes. I was able to confirm that before.
The machines are the routers itself! I assume zero time offset.
I collected some log for a single metric.
bash-4.2# ip netns exec ns-MGMT /mnt/flash/gnmic --config /mnt/flash/gnmic.yml subscribe --debug 2023/10/25 23:38:34.237274 /home/runner/work/gnmic/gnmic/app/app.go:222: [gnmic] version=0.33.0, commit=d8f931e, date=2023-10-08T17:08:01Z, gitURL=https://github.com/openconfig/gnmic, docs=https://gnmic.openconfig.net [..] 2023/10/25 23:38:34.256913 /home/runner/go/pkg/mod/google.golang.org/grpc@v1.58.2/grpclog/logger.go:53: [gnmic] [core] [Channel #1] Channel Connectivity change to READY 2023/10/25 23:38:34.256924 /home/runner/work/gnmic/gnmic/app/gnmi_client_subscribe.go:219: [gnmic] target "localhost" gNMI client created 2023/10/25 23:38:34.256944 /home/runner/work/gnmic/gnmic/app/gnmi_client_subscribe.go:222: [gnmic] sending gNMI SubscribeRequest: subscribe='subscribe:{subscription:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} mode:SAMPLE sample_interval:15000000000}}', mode='STREAM', encoding='JSON', to localhost 2023/10/25 23:38:34.306576 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277097585598825 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}} 2023/10/25 23:38:34.306717 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:sync_response:true} 2023/10/25 23:38:34.306780 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277097585598825,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}} 2023/10/25 23:38:34.306866 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:38:34.30683334 +0000 UTC m=+0.103545344 2023/10/25 23:38:49.307105 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277127584156213 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:2}}}} 2023/10/25 23:38:49.307234 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277127584156213,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":2}} 2023/10/25 23:38:49.307278 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=2.000000,time=nil,addedAt=2023-10-25 23:38:49.307256663 +0000 UTC m=+15.103968667 2023/10/25 23:39:04.307129 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}} 2023/10/25 23:39:04.307260 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}} 2023/10/25 23:39:04.307299 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:04.307281168 +0000 UTC m=+30.103993162 2023/10/25 23:39:19.306720 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}} 2023/10/25 23:39:19.306842 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}} 2023/10/25 23:39:19.306893 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:19.306866352 +0000 UTC m=+45.103578356 2023/10/25 23:39:34.306827 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277137584081794 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:1}}}} 2023/10/25 23:39:34.306944 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":1}} 2023/10/25 23:39:34.307016 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=1.000000,time=nil,addedAt=2023-10-25 23:39:34.306983488 +0000 UTC m=+60.103695492 <=== METRICS DISAPPEAR 2023/10/25 23:39:49.307001 /home/runner/work/gnmic/gnmic/app/collector.go:70: [gnmic] target "localhost": gNMI Subscribe Response: &{SubscriptionName:cpu SubscriptionConfig:{"name":"cpu","paths":["/components/component[name=CPU12]/cpu/utilization/state/instant"],"mode":"stream","stream-mode":"sample","encoding":"json","sample-interval":15000000000} Response:update:{timestamp:1698277187583141822 update:{path:{elem:{name:"components"} elem:{name:"component" key:{key:"name" value:"CPU12"}} elem:{name:"cpu"} elem:{name:"utilization"} elem:{name:"state"} elem:{name:"instant"}} val:{uint_val:0}}}} 2023/10/25 23:39:49.307117 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277187583141822,"tags":{"component_name":"CPU12","source":"localhost","subscription-name":"cpu"},"values":{"/components/component/cpu/utilization/state/instant":0}} 2023/10/25 23:39:49.307156 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:449: [prometheus_output:prom-output] saved key=3840991386943600484, metric: name=arista_cpu_components_component_cpu_utilization_state_instant,labels=[component_name=CPU12,source=localhost,subscription_name=cpu],value=0.000000,time=nil,addedAt=2023-10-25 23:39:49.307138443 +0000 UTC m=+75.103850437 <=== METRICS REAPPEAR ^C received signal 'interrupt'. terminating...
It seems that its currently only updating the timestamp on change.
Still i see several updates within those 60s:
2023/10/25 23:38:34.306780 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277097585598825, 2023/10/25 23:38:49.307234 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277127584156213, 2023/10/25 23:39:04.307260 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794, 2023/10/25 23:39:19.306842 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794, 2023/10/25 23:39:34.306944 /home/runner/work/gnmic/gnmic/outputs/prometheus_output/prometheus_output/prometheus_output.go:435: [prometheus_output:prom-output] got event to store: {"name":"cpu","timestamp":1698277137584081794,
Those metrics are being resent with the same timestamp:
2023/10/25 23:39:04.307260 ....... : {"name":"cpu","timestamp":1698277137584081794, 2023/10/25 23:39:19.306842 ........: {"name":"cpu","timestamp":1698277137584081794, 2023/10/25 23:39:34.306944 .......: {"name":"cpu","timestamp":1698277137584081794,
A metric is defined by the subscription name, the path as well as the labels
If the router is not able to send those notifications with an updated timestamp you can use the override-ts processor to make gnmic override the timestamp on reception ref
i think I solved this myself by specifically adding a cache: {}
key under the prometheus output section like so. Here's the docs that helped me
outputs:
prom-output:
type: prometheus
listen: :9804
cache: {}
once I added this my curl was showing the metrics holding stable, something like this.
while true; do curl -s gnmic:9804/metrics | wc -l; sleep 1; done
cache: {}
i can confirm, that adding a cache fixes the issue. In addition some metrics were expiring after 60s, but changing the timestamp to send-time fixed it for those.
What i'm wondering is, why it dit work flawless with 0.32.0.
@spolack I think I found what the issue is and fixed it in v0.35.1 could you give it a try ?
I think I am having this same issue, will try v0.35.1
UPDATE: Fixed for me in 0.35.1. Ty!
Closing this, feel free to reopen if you believe it's not fixed.
Hello,
first of all thank you for that amazing tool! Basically unlocks unlimited ways to mangle our metrics!
I'm still evaluating, but seem to have found the first bug. The metrics in the scrape endpoint are frequently dropping, feels like a timeout, but i can confirm from the API endpoint and debug log, that metrics are still received on the sample-interval from the subscription.
Config
Metrics:
Following chart shows the problem
While debugging i thought of that the prometheus outputs internal cache might fail to receive updates. For that i tried to enable
export-timestamps
on the output to check, whether the timestamps get updated with every interval.Instead of helping debugging this issue this fixed the problem. Now the metrics are permanently present and the timestamp is updated every 15s.
Best Simon