newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
193 stars 139 forks source link

Enhance Solr 7+ JMX metrics by including clustered Solr collections which include the collection name, shard name, and (replica) core #1812

Closed obenkenobi closed 2 months ago

obenkenobi commented 2 months ago

Overview

Allow the agent to capture clustered Solr cores via our JMX API .

Single instances of Solr will contain function the same.

In clustered Solr cores, their JMX MBeans will look like solr:dom1=core,dom2=dummy_collection,dom3=shard2,dom4=replica_n2,category=CACHE,scope=searcher,name=documentCache which the agent did not capture.

Now the agent will export Solr values with nested DOM (i.e. clustered cores) in the formats:

Fpr example: JMX/solr/dummy_collection.shard2.replica_n2/documentCache/%/

The Solr UI will recognize the cores in the format {collection name}.{shard}.{core name}

New Iteration Syntax Internal To The Agent

This is done by adding new iteration syntax for object name keys for MBeans when formatting them into agent metrics.

You may see it in any class that extends com.newrelic.agent.jmx.metrics.JmxFrameworkValues in the newrelic-agent module.

To see this in practice, we will query an MBean from JMX with Solr using the string:

solr:dom1=core,*,category=CACHE,scope=searcher,name=documentCache

which may return an MBean with an object name like:

solr:dom1=core,dom2=dummy_collection,dom3=shard2,dom4=replica_n2,category=CACHE,scope=searcher,name=documentCache

We take a metric name with the format JMX/solr/{for:dom[2::.]}/documentCache/%/ such that it becomes the metric JMX/solr/dummy_collection.shard2.replica_n2/documentCache/%/

We can use this metric syntax in other ways such that JMX/solr/{for:dom[2:4:.]}/documentCache/%/ can translate to JMX/solr/dummy_collection.shard2.documentCache/%/

In the format we see the following placeholder {for:dom[start:end:delimiter]} where for: indicate some iterable object name keys and [start:end] represents some range where start is the start number and end represents an optional exclusive end of the range. [start::delimiter] means the end is unbounded. This is similar to python syntax you may have with accessing sublists. The delimiter represents a string that will join the iterated values in a sequence. If left empty, it will default to /.

Currently the solution will work with positive numbers for simplicity but we can eventually extend this to more characters.

Since the syntax is new and there maybe unexpected side effects, we will include new undocumented temporarily configuration to disable it for Solr:

yaml:

# newrelic.yml snippet
common: &default_settings
  jmx:
    enable_iterated_objectname_Keys: true # default is true

system property:

-Dnewrelic.config.jmx.enable_iterated_objectname_Keys=true

environment variable:

NEW_RELIC_JMX_ENABLE_ITERATED_OBJECTNAME-KEYS=true

By default the configuration is true. To disable the syntax, set it to false.

Related Github Issue

https://github.com/newrelic/newrelic-java-agent/issues/1571

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 85.71429% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 70.89%. Comparing base (9aee916) to head (53f531c). Report is 4 commits behind head on main.

Files Patch % Lines
...ain/java/com/newrelic/agent/jmx/create/JmxGet.java 79.16% 5 Missing and 5 partials :warning:
.../java/com/newrelic/agent/config/JmxConfigImpl.java 0.00% 1 Missing :warning:
...c/main/java/com/newrelic/agent/jmx/JmxService.java 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1812 +/- ## ========================================= Coverage 70.89% 70.89% - Complexity 9956 9965 +9 ========================================= Files 827 828 +1 Lines 39880 39955 +75 Branches 6043 6053 +10 ========================================= + Hits 28272 28327 +55 - Misses 8887 8896 +9 - Partials 2721 2732 +11 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

obenkenobi commented 2 months ago

The pekko AIT failing is because this branch does not have the pekko instrumentation which is in another PR. So when all other tests pass, this branch will be merged.