newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
202 stars 143 forks source link

Solr 8 JMX metrics enhancement #78

Closed breedx-nr closed 3 years ago

breedx-nr commented 4 years ago

TL;DR

When Solr8 is using cloud sharding, some of the JMX metrics are not reporting. It would be great if they would!

About

If we look in the Solr7JmxValues.java class we can see that the `updateHandler' is looking for beanName:

solr:dom1=core,dom2=*,category=UPDATE,scope=updateHandler,name=*

but when Solr is configured for sharding, the names are dynamically generated to an arbitrary depth, and may look more like this:

solr:dom1=core,dom2=solr-inventory,dom3=shard1,dom4=replica_n1,category=UPDATE,scope=updateHandler,name=adds

For caching, the NR Solr 7 jmx support is looking for

solr:dom1=core,dom2=*,category=CACHE,scope=searcher,name=documentCache

but Solr8 might look more like

solr:dom1=core,dom2=solr-inventory,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=documentCache

When the New Relic JMX component cannot find these beans, the data does not get reported and ends up showing up as zeros in the UI.

Feature Description

The agent should be enhanced to be able to find the sharded JMX beans. Rather than hard-coding a handful of fixed names, the agent should adapt to solr 8 and be able to list or otherwise enumerate the bean names and match the arbitrary-depth domain names, as shown above.

These beans should be queried/monitored by the agent and reported to New Relic for display in the NR1 UI.

Describe Alternatives

Solr uses dropwizard/codahale metrics internally, and so the New Relic dropwizard reporter might be able to be used to get the same telemetry. Some experimentation/exploration would be required to verify that the same information can be obtained...and also how it might map to a cohesive user experience.

Additional context

It is unknown what earliest version of Solr supports cloud/shard bean names with arbitrary domain depth. It is likely that this will continue in future versions of dropwizard.

Priority

"Really Want". More than one customer has asked for this support.

kford-newrelic commented 3 years ago

Reviewed and determined to be permanently deferred.

rahmnathan commented 2 years ago

Hi @kford-newrelic. Does this mean Solr Cache monitoring will no longer be supported going forward?

mmulligan03 commented 2 years ago

@rahmnathan I had to create a custom NR extension for SolrCloud caches and updates.

Create a file called newrelic-solr-extension.yml in /opt/newrelic/extensions/

This is what I had in my file (not sure if it can be paired down at all) and they are all queryable via the Metrics table in NRQL:

name: SolrCloudCustom
version: 1.0
enabled: true
jmx:
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=CACHE,scope=searcher,name=*
    metrics:
      - attributes: inserts, hits, size, ramBytesUsed, lookups, hitratio, evictions, warmupTime
        type: simple
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=UPDATE,scope=updateHandler,name=*
    metrics:
      - attributes: Value
        type: simple
  - object_name: solr:dom1=node,category=UPDATE,scope=updateShardHandler,name=*
    metrics:
      - attributes: Count, Max, Mean, Min, StdDev, MeanRate, 50thPercentile, 95thPercentile, 98thPercentile, 99thPercentile, 999thPercentile, OneMinuteRate, FiveMinuteRate, FifteenMinuteRate
        type: simple
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=INDEX,name=sizeInBytes
    metrics:
      - attributes: Value
        type: simple
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=QUERY,scope=/select,name=*
    metrics:
      - attributes: Count, Max, Mean, Min, StdDev, MeanRate, 50thPercentile, 95thPercentile, 98thPercentile, 99thPercentile, 999thPercentile, OneMinuteRate, FiveMinuteRate, FifteenMinuteRate
        type: simple
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=QUERY,scope=/get,name=*
    metrics:
      - attributes: Count, Max, Mean, Min, StdDev, MeanRate, 50thPercentile, 95thPercentile, 98thPercentile, 99thPercentile, 999thPercentile, OneMinuteRate, FiveMinuteRate, FifteenMinuteRate
        type: simple
  - object_name: solr:dom1=core,dom2=*,dom3=*,dom4=*,category=SEARCHER,scope=*,name=*
    metrics:
      - attributes: Count, Value
        type: simple
rahmnathan commented 2 years ago

Thanks @mmulligan03. Did this result in the 'Solr Caches' page being populated in NewRelic? Otherwise, could you share a query you're looking at to inspect this data?

I got this config file in place, but I haven't spent time with NewRelic's query language.

mmulligan03 commented 2 years ago

It doesn't fix the Solr Caches or Update page in APM but you can query all the metrics collected in the Metrics table Like so: SELECT average(newrelic.timeslice.value) FROM Metric WHERE appName = 'YOUR_SOLR_APP_NAME' AND newrelic.timeslice.value IS NOT NULL WITH METRIC_FORMAT 'JMX/solr/null/{collection}/{shard}/{replica}/CACHE/searcher/{cacheName}/core/hitratio' facet collection, cacheName SINCE 1 hour ago timeseries MAX

I reference this a lot when working with Metrics https://docs.newrelic.com/docs/data-apis/understand-data/metric-data/query-apm-metric-timeslice-data-nrql

Something like this will tell you all the Solr Metrics you have now SELECT uniques(metricTimesliceName) FROM Metric WHERE appName like 'YOUR_SOLR_APP_NAME' AND newrelic.timeslice.value IS NOT NULL and metricTimesliceName like 'JMX/solr/%'

mmulligan03 commented 2 years ago

Not sure why they haven't been able to fix their agent to use the alternate format when running in SolrCloud but this worked for us.

rahmnathan commented 2 years ago

@mmulligan03 Really appreciate this! Using your configuration + query I've been able to get this stuff visualizing again, though obviously not as convenient as the built-in functionality that worked previously.

We're considering Graphite and/or Prometheus as Solr is supposed to support those tools as well, but you've been immensely helpful (and prompt!) getting me past this issue.

mmulligan03 commented 2 years ago

I struggled for a bit to figure out how to get it working so I'm glad I could spare you that!

kford-newrelic commented 2 years ago

@rahmnathan At the moment, we have a lot we want to accomplish for our agent roadmap and when we drew the line, Solr 8 didn't make the cut. That doesn't necessarily mean it's a "forever" thing, just for the current roadmap. Of course, if there's an enterprising engineer that wants to start with our existing instrumentation and create a PR with an update, that would be awesome!

@mmulligan03 really like your approach to crafting a custom instrumentation solution - we hope that others interested in Solr 8 can benefit from your hard work!