Closed shishir-a412ed closed 4 years ago
Hi @shishir-a412ed!
I took a look at where this interval is defined and the loop is every 1s, but backs-off up to 5s. At that point the communication channel is established. See the stats_hook
. But then the driver can decide to send stats on that channel on an interval of its choosing. Take a look at the shared code for the Linux executor (used by exec
and java
task drivers) for an example. So I think it's a matter of making sure the stats channel isn't being torn down after stats are being sent?
@tgross
If you look at the description
of stats_hook
it looks like the 5-second backoff is for retrying until the channel is established.
My question is more on once the channel is established, driver will constantly flood the channel with task stats (every 1 second).
We have followed the same code as Linux executor which is used by exec
and java
task drivers. Now to corroborate my claim, I added a logger (print) statement before timer.Reset(interval)
and it is printing every 1 second.
Consider a use case where a custom task driver launches 100 tasks on a nomad client node. The driver will be fetching task stats for these 100 tasks every second and reporting it back to stats channels.
Is there an option to add an interval field in the plugin stanza of the nomad client HCL?
plugin "my plugin" {
config {
enabled = true
interval = 5
}
}
and then task driver can read this value interval = 5
and use it for timer.Reset(interval)
Ah yeah, I see what you're saying... that interval is being threaded all the way down from the stats hook configuration.
It's being added to the stats hook when the task runner hooks are initialized, which should be configured by the client.telemetry.collection_interval
field in the Nomad client configuration. So this is configurable on a per-client basis.
If you wanted to have it available on a per-plugin basis for a custom plugin, you could add the field in the plugin's configuration schema, and then have your the plugin's config override the interval it gets from the client configuration.
@tgross Maybe I am missing something trivial, but this is what I am doing:
1) Added "stats_interval": hclspec.NewAttr("stats_interval", "string", false),
Somewhere here: lxc_path
2) Added a new field StatsInterval string
codec:"stats_interval"` in my
Configstruct. Somewhere here: [
LXCPath`](https://github.com/hashicorp/nomad-driver-lxc/blob/master/lxc/driver.go#L131)
3) Tried to access StatsInterval
in StartTask
d.logger.Info(fmt.Sprintf("Stats interval: %s", d.config.StatsInterval))
Similar to lxcPath()
I was hoping that parsing would happen automatically, and I will be able to access StatsInterval
in (3). However, it's printing no value (empty) for me.
Any idea what's going on?
Setting the value here:
$ cat win_client.hcl
plugin "win_iis" {
config {
enabled = true
stats_interval = "55s"
}
}
@shishir-a412ed This looks correct, so I'm a bit surprised it's not working either. My hunch is a binary name mismatch - the plugin binary name must be win_iss
(or win_iss.exe
) for that plugin stanza to work.
For further debugging, I might suggest adding logging in driver.SetConfig
function, and/or setting a default value like:
"stats_interval": hclspec.NewDefault(
hclspec.NewAttr("stats_interval", "string", false),
hclspec.NewLiteral(`"55s"`),
)
If this is an open source project, we can dig into it a bit more as well.
@notnoop Thanks for taking a look. You are right! binary name mismatch was causing the problem. Changing the binary name to win_iis.exe
fixed it, and I am able to read the value now.
The plugin is not open source right now, but we do plan to open source it at some point.
Glad to hear it helped. I created https://github.com/hashicorp/nomad/issues/7807 to ease debugging or noticing this issue!
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
We are authoring a custom task driver for nomad using https://www.nomadproject.io/docs/internals/plugins/task-drivers/
When you launch a task using the plugin, the following happens.
1) A call to
StartTask
which is responsible for launching the task. 2) A call toTaskStats
which will return a channel, and the driver will constantly write stats to this channel. 3) A call toWaitTask
which will return a channel. The driver will write exit results to this channel when the task exits or close the channel if the context is canceled.I have a question regarding (2)
TaskStats
When (2) is called, a goroutine
handleStats
is launched which reports stats everyinterval
seconds.Right now the default value is
1 or 2 seconds
. Not sure which one is correct. My logs are showing1 second
however the value defined in the proto is2 seconds
https://github.com/hashicorp/nomad/blob/master/plugins/drivers/proto/driver.proto#L248Can this be configured? Using the nomad plugin stanza or some other option? Reporting task stats every second is very CPU intensive and not every driver would like to do it that often.
Since this is a driver level implementation, driver should at least have a way to configure this value.
A very simple solution would be to just override it (interval = 5) after you receive the parameter in
handleStats
. However, I wanted to check if there is a better (right) way to configure this value?