newrelic / nri-jmx

New Relic Infrastructure JMX Integration
Apache License 2.0
6 stars 16 forks source link

SDK update #47

Closed varas closed 4 years ago

varas commented 4 years ago

Description of the changes

SDK update.

TODO

PR Review Checklist

Author

Reviewer

camdencheek commented 4 years ago

Did a couple of quick smoke tests, and it looks like when multiple beans are defined in collection definition (see example below), the collection fails with [ERR] Failed to complete collection: cannot query: java.lang:type=Memory, error: read |0: file already closed Config file:

# Standard JVM Metrics

collect:
    - domain: java.lang
      event_type: JVMSample
      beans:
          - query: type=GarbageCollector,name=*
            attributes:
                - CollectionCount
                - CollectionTime
          - query: type=Memory
            attributes:
                - HeapMemoryUsage.Committed
                - HeapMemoryUsage.Init
                - HeapMemoryUsage.Max
                - HeapMemoryUsage.Used
                - NonHeapMemoryUsage.Committed
                - NonHeapMemoryUsage.Init
                - NonHeapMemoryUsage.Max
                - NonHeapMemoryUsage.Used

Additionally, the collection time significantly increases compared to the previous version.

~/go/src/github.com/newrelic/nri-jmx sdk_update*
❯ time bin/nr-jmx --collection_files /Users/ccheek/go/src/github.com/newrelic/nri-jmx/jvm-metrics.yml.sample --jmx_host tomcat-cent7-1 --jmx_port 9000 --jmx_user admin --jmx_pass admin --jmx_uri_path jmxrmi > /dev/null
bin/nr-jmx --collection_files  --jmx_host tomcat-cent7-1 --jmx_port 9000       0.01s user 0.01s system 0% cpu 11.131 total
~/go/src/github.com/newrelic/nri-jmx master*
❯ time bin/nr-jmx --collection_files /Users/ccheek/go/src/github.com/newrelic/nri-jmx/jvm-metrics.yml.sample --jmx_host tomcat-cent7-1 --jmx_port 9000 --jmx_user admin --jmx_pass admin --jmx_uri_path jmxrmi > /dev/null
bin/nr-jmx --collection_files  --jmx_host tomcat-cent7-1 --jmx_port 9000       0.01s user 0.01s system 1% cpu 1.167 total
camdencheek commented 4 years ago

It appears the significantly increased time is due to the change where we don't exit until we time out. If I increase the timeout from 10 seconds to 20 seconds, the execution time of the NRJMX command increases by approximately 10 secons.

camdencheek commented 4 years ago

Alright, here’s what I’m seeing I’m seeing after a little digging 1) Query has been changed so that it does not return until it times out. This is rather problematic since timeouts default to 10 seconds, and there are often multiple queries. You mentioned in the comments that this new behavior was so that the SDK can support multi-line responses — I’m not sure I understand the connection here. Does nrjmx now send back multiple lines for a single query? 2) Multiple queries against the same nrjmx instance no longer work. This appears to be because all queries are timing out, which closes the connection. If we fix the timeout issue, I believe this will also be fixed.

varas commented 4 years ago

Return on severe errors was missing we added it https://github.com/newrelic/nri-jmx/pull/47/commits/76e3d57fd41614505fa4a92d5c3bef630aaee444#diff-bbc8660333f1998a8cf44b84b3f4d8ddR274

Could you check it or provide a way to test it?

varas commented 4 years ago

Restored single nrjmx stdout line retrieval https://github.com/newrelic/infra-integrations-sdk/pull/202

You shouldn't have an error case where timeout is reached besides long queries.

camdencheek commented 4 years ago

Did some more testing. In positive news, the error messages are great! Huge improvement. However, I'm still unable to run multiple queries. Getting a timeout for all but the first query. Looking into it now, and will post results when I get anything

camdencheek commented 4 years ago

Pretty sure I found the issue. PR with fix here: https://github.com/newrelic/infra-integrations-sdk/pull/203