Closed varas closed 4 years ago
Did a couple of quick smoke tests, and it looks like when multiple beans are defined in collection definition (see example below), the collection fails with [ERR] Failed to complete collection: cannot query: java.lang:type=Memory, error: read |0: file already closed
Config file:
# Standard JVM Metrics
collect:
- domain: java.lang
event_type: JVMSample
beans:
- query: type=GarbageCollector,name=*
attributes:
- CollectionCount
- CollectionTime
- query: type=Memory
attributes:
- HeapMemoryUsage.Committed
- HeapMemoryUsage.Init
- HeapMemoryUsage.Max
- HeapMemoryUsage.Used
- NonHeapMemoryUsage.Committed
- NonHeapMemoryUsage.Init
- NonHeapMemoryUsage.Max
- NonHeapMemoryUsage.Used
Additionally, the collection time significantly increases compared to the previous version.
~/go/src/github.com/newrelic/nri-jmx sdk_update*
❯ time bin/nr-jmx --collection_files /Users/ccheek/go/src/github.com/newrelic/nri-jmx/jvm-metrics.yml.sample --jmx_host tomcat-cent7-1 --jmx_port 9000 --jmx_user admin --jmx_pass admin --jmx_uri_path jmxrmi > /dev/null
bin/nr-jmx --collection_files --jmx_host tomcat-cent7-1 --jmx_port 9000 0.01s user 0.01s system 0% cpu 11.131 total
~/go/src/github.com/newrelic/nri-jmx master*
❯ time bin/nr-jmx --collection_files /Users/ccheek/go/src/github.com/newrelic/nri-jmx/jvm-metrics.yml.sample --jmx_host tomcat-cent7-1 --jmx_port 9000 --jmx_user admin --jmx_pass admin --jmx_uri_path jmxrmi > /dev/null
bin/nr-jmx --collection_files --jmx_host tomcat-cent7-1 --jmx_port 9000 0.01s user 0.01s system 1% cpu 1.167 total
It appears the significantly increased time is due to the change where we don't exit until we time out. If I increase the timeout from 10 seconds to 20 seconds, the execution time of the NRJMX command increases by approximately 10 secons.
Alright, here’s what I’m seeing I’m seeing after a little digging 1) Query has been changed so that it does not return until it times out. This is rather problematic since timeouts default to 10 seconds, and there are often multiple queries. You mentioned in the comments that this new behavior was so that the SDK can support multi-line responses — I’m not sure I understand the connection here. Does nrjmx now send back multiple lines for a single query? 2) Multiple queries against the same nrjmx instance no longer work. This appears to be because all queries are timing out, which closes the connection. If we fix the timeout issue, I believe this will also be fixed.
Return on severe errors was missing we added it https://github.com/newrelic/nri-jmx/pull/47/commits/76e3d57fd41614505fa4a92d5c3bef630aaee444#diff-bbc8660333f1998a8cf44b84b3f4d8ddR274
Could you check it or provide a way to test it?
Restored single nrjmx
stdout line retrieval https://github.com/newrelic/infra-integrations-sdk/pull/202
You shouldn't have an error case where timeout is reached besides long queries.
Did some more testing. In positive news, the error messages are great! Huge improvement. However, I'm still unable to run multiple queries. Getting a timeout for all but the first query. Looking into it now, and will post results when I get anything
Pretty sure I found the issue. PR with fix here: https://github.com/newrelic/infra-integrations-sdk/pull/203
Description of the changes
SDK update.
TODO
PR Review Checklist
Author
Reviewer