puppetlabs / puppet_operational_dashboards

Apache License 2.0
5 stars 22 forks source link

Add support for PCP connection count on puppetserver #74

Open bastelfreak opened 2 years ago

bastelfreak commented 2 years ago

Use Case

I'm currently debugging pcp-agent<>puppetserver connection problems with Puppet Engineers for a customer. I noticed that there is a metric for the amount of open PCP connections, but no related dashboard. Having that would help a lot to identify overloaded puppetservers / uneven loadbalancing of PCP connections. I used:

curl --cert $(puppet config print hostcert) --key $(puppet config print hostprivkey) --cacert $(puppet config print localcacert) https://$(hostname -f):8140/metrics/v2/read/default:name=puppetlabs.pcp.connections --silent | jq '.value[]'

Describe the Solution You Would Like

A dashboard that tracks a connection counter over time

Describe Alternatives You've Considered

Customer has other monitoring soltutions where we could implement this, but I think such a panel would be really helpful for other users as well.

Additional Context

I'm not 100% sure if default:name=puppetlabs.pcp.connections gives the amount of current open connections, but I think so. I didn't find any good documentation about the JMX/Jolokia metrics

m0dular commented 2 years ago

Thanks, we're currently working on adding more meaningful Orchestrator metrics in PE, so we'll consider this when we build out a dashboard for those. It may be worth making a dashboard with what is already available, such as this one and JVM metrics.

SimonHoenscheid commented 1 year ago

This may be part of #148

m0dular commented 1 year ago

This particular mbean, puppetlabs.pcp.connections, looks to be the only one not exposed by the /status api, so it's not currently available. We would have to add another set of config to do it, but since it would be a useful metric I'll leave this issue open.

SimonHoenscheid commented 1 year ago

Yes, and the metric seems to be only available on compilers. Did not check a standard installation.

m0dular commented 1 year ago

I think it's available on the primary via Orchestrator's api, so if you replaced 8140 with 8143 in the example it should work. That reminds me that we missed collecting PCP metrics on compilers in #148 because they come from Puppet server, but we do have a ticket for that.

tuxmea commented 1 year ago

Is this feature already implemented in the orchestration dashboard?

m0dular commented 1 year ago

We don't currently have puppetlabs.pcp.connections since it's not accessible via the /status api. We'd need to add a new puppet_operational_dashboards::telegraf::config type and have it POST to the /metrics/v2 api similar to how the PDB metrics do here.

For compiler PCP metrics, we need to do something like

One concern might be if the broker service metrics don't exist on OSP, I'm not sure if they do. If you define a path on a json_v2.object that doesn't exist, the whole collection will fail. We could work around that by adding the fields to included_keys instead, which will just skip them if they aren't available.

bastelfreak commented 1 year ago

The broker metrics don't exist on open source puppetserver. Broker and orchestrator are only bundled/provided in PE.

bastelfreak commented 1 year ago

PCP metrics are only collected within if $collection_method == 'all' { and not within elsif $collection_method == 'local' {

bastelfreak commented 1 year ago

Hi, I raised two PRs, #178 and #179 . I think at the moment the PCP metric collection doesn't work at all. The variable in the agent.pp is wrong, as mentioned in my comment above, also $puppet_operational_dashboards::include_pe_metrics isn't known in the agent.pp. I fixed this and reworked the tests to have a better coverage. in #179 I fixed the collection of PCP metrics when telegraf is collecting locally. I also added a unit tests to validate the setup at a PE customer I know.

bastelfreak commented 1 year ago

I also see some problems with https://github.com/puppetlabs/puppet_operational_dashboards/pull/168 and https://github.com/puppetlabs/puppet_operational_dashboards/pull/167. https://github.com/puppetlabs/puppet_operational_dashboards/pull/167/commits/3876b7905a62f89a12430ee8419774667a3349d5#diff-57ccecc0a9208b20f3e973538fdde86ccbb9a774ddeca8bca6f5737aa31ddfa6R440 added the PCP connection counter. That got deleted in https://github.com/puppetlabs/puppet_operational_dashboards/pull/168/files#diff-57ccecc0a9208b20f3e973538fdde86ccbb9a774ddeca8bca6f5737aa31ddfa6L440 and I think that was by accident?

I mentioned that already somewhere else: both PRs are quite noisy. They contain a lot of changes and no tests, they were hard to review. They were merged quite fast and I had no time to respond in time.

m0dular commented 1 year ago

Yes, removing the panels was my mistake. Testing and merging multiple features that change the dashboards is a bit error-prone because managing a large JSON file is difficult. Maybe we could use something like the public dashboards feature, but it's currently in preview and not enabled by default.

bastelfreak commented 1 year ago

@m0dular can you take a look at this please? The dashboard added in https://github.com/puppetlabs/puppet_operational_dashboards/pull/167 is still missing in the latest release :(

m0dular commented 1 year ago

Sorry about the delay; I opened #199 to add it back. I checked that all the config to capture the metric was still there, and the new dashboard version displays them correctly for me. The name of the mbean is default:name=puppetlabs.pcp.connections, so it may not have actually been correct in the previous version. You should be able to create a panel in your existing dashboard and look for the value just to confirm.

m0dular commented 1 year ago

@bastelfreak Does this update work for you?

bastelfreak commented 1 year ago

Hey. Sorry, was vacationing and will test on Monday.