kpetremann / salt-exporter

Salt Prometheus exporter working out of the box without any configuration on Salt side. Comes with an event watcher TUI.
https://kpetremann.github.io/salt-exporter/
MIT License
30 stars 8 forks source link

Is it possible to get the job duration for states? #66

Closed evanrich closed 4 months ago

evanrich commented 5 months ago

I'm not sure if this is possible with how you implement this but wanted to ask. In your example, you have metrics such as

salt_expected_responses_total{function="cmd.run", state=""} 6
salt_expected_responses_total{function="state.sls",state="test"} 1

salt_function_responses_total{function="cmd.run",state="",success="true"} 6
salt_function_responses_total{function="state.sls",state="test",success="true"} 1

salt_function_status{minion="node1",function="state.highstate",state="highstate"} 1

salt_new_job_total{function="cmd.run",state="",success="false"} 3
salt_new_job_total{function="state.sls",state="test",success="false"} 1

salt_responses_total{minion="local",success="true"} 6
salt_responses_total{minion="node1",success="true"} 6

salt_scheduled_job_return_total{function="state.sls",minion="local",state="test",success="true"} 2

salt_health_last_heartbeat{minion="local"} 1703053536
salt_health_last_heartbeat{minion="node1"} 1703053536

salt_health_minions_total{} 2

the heartbeat interval is great for checking if minions are alive or not, and the function status is great for telling if a state was successful, for example


salt_function_status{minion="node1",function="state.highstate",state="highstate"} 1
tells me if node1 was successful in highstating.   Is it possible to also grab how long this took?  I know that salt, when using something like `--returner=node-exporter` will return back the length of time each state took, but can your exporter, when run from the saltmaster, also grab how long the state took to run?   I can export metrics from each node using the --return option but would be helpful to grab it from the master if possible.

Thanks!
kpetremann commented 5 months ago

Hi @evanrich, Salt does not expose the duration of a job. The calculation is done using the request and response message timestamp.

It would increase the complexity of the exporter too much: the exporter would need to keep all job requests in memory to do the calculation when receiving a response + running some sort of garbage collector to clean 'timed out' request.

The best way is to use a returner like you mentioned.