Closed tantra35 closed 4 years ago
Hi @tantra35! It looks like client stats are collected at the specified interval but task resource usage stats collection is a tight loop around the collection here: stats_hook.go#L87-L112. Based on the more detailed treatment of the configuration in the telemetry metrics docs I'm not sure whether or not that's intentional and just a documentation gap.
My colleague @notnoop has pointed out to me that we do pass the interval to the task driver's Stats()
method (see here) so that the drivers can run the stats collection on that interval. The exact implementation is up to the driver though. So either we're missing a spot to thread through the interval from the config or the specific task driver isn't implementing it correctly. Which task driver are you using here, @tantra35 ?
@tgross on test stand for simplicity purposes we use exec
driver
@tgross thanks for clarify as code works. After some debugging i found that value of interval passed to exec.(*Driver).TaskStats
is 1s:
(dlv) break exec.(*Driver).TaskStats
(dlv) c
> github.com/hashicorp/nomad/drivers/exec.(*Driver).TaskStats() /opt/gopath/src/github.com/hashicorp/nomad/drivers/exec/driver.go:482 (hits goroutine(2011):1 total:3) (PC: 0x141bbe3)
Warning: debugging optimized function
(dlv) bt
0 0x000000000141bbe3 in github.com/hashicorp/nomad/drivers/exec.(*Driver).TaskStats
at /opt/gopath/src/github.com/hashicorp/nomad/drivers/exec/driver.go:482
1 0x0000000001311332 in github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*DriverHandle).Stats
at /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/driver_handle.go:49
2 0x0000000001311332 in github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*LazyHandle).Stats
at /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/lazy_handle.go:141
3 0x0000000001316e4d in github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*statsHook).callStatsWithRetry
at /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/stats_hook.go:126
4 0x0000000001316d65 in github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*statsHook).collectResourceUsageStats
at /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/stats_hook.go:92
5 0x00000000004625e1 in runtime.goexit
at /usr/local/go/src/runtime/asm_amd64.s:1337
(dlv) print interval
github.com/hashicorp/nomad/vendor/github.com/gorilla/websocket.writeWait (1000000000)
so problem is in interval
value passing code, and it looks like all drivers doensn't honor collection_interval
Thanks @tantra35! We'll look into this!
@tgross ah sorry, just found and this is our misconfiguration in provisioning, so bug doesn't realy present
@tgross after some investigations i mast reopen this issue, due problem actualy exist at least in exec
driver. interval
value passed correctly(we made wrong conclusions due mistake in provisioning(there was not set collection_interval
value)), other info menitioned in the beginning of this issue are correct
No problem. 😀 We'll take a look.
Repro from our testing for 0.10.4's upcoming release candidate.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.10.2 (40edb4d3dd7145d2c535160fa6f7f3eb0cb4b8f7+CHANGES)
Is it normal that nomad doesn't honor
collection_interval
(https://www.nomadproject.io/docs/configuration/telemetry.html#collection_interval) when collecting statistic by allocationsFor example on nomad client we have flow telemetry configuration:
And i think that telemetry for allocation will be collected every 60 seconds, but this is not true, and statistics for allocation collected every 1 second, at
localhost:8125
we have custom agregated statsd proxy and can intercept all metrics, so from logs from this daemon we have follow frequency on metrics:for example metric
nomad.client.allocs.memory.kernel_max_usage.vault_debug-00.test.5f9fb00c-93da-2a3c-80ba-21322f989a6a.vault_debug_task.default
collected evry 1 second