Closed KGOH closed 4 months ago
Hi @KGOH thanks for letting us know about this issue. A few clarifying questions:
Have you noticed this occurring more than once? If yes, approximately how often does it occur? Any repeatable pattern that you can discern?
@kford-newrelic this happens constantly, approx. every 15 minutes. But only for a specific transaction on the specific java app. It contains many broken metrics within this transaction, such as accessing db etc. I suspect that the transaction may be asynchronous as I see the lettuce
lib in the call stack
Any chance you can create a repro app from that transaction that you can share with us?
If we can get a repro, we might stand a chance of figuring this out...
@kford-newrelic unfortunately, I don't know how to reproduce this :(
Is there anything unique with regards to the problem transaction? Does it use make external calls or use the Process API in anyway? Does it use any sort of async framework? Would it be possible to get some source code or pseudo-code?
We're obviously looking for any pointers in what the underlying cause is since we've been unable to repro this ourselves.
@jtduffy here are transaction metrics I managed to extract. I erased sensitive information, but this should still convey the overall idea of technologies used
| :name | :avg_calls | :avg_time |
|---------------------------------------------------------------------------------------------------------+------------+-----------|
| Datastore/operation/Redis/PEXPIRE | 5.214 | -30.283 |
| Datastore/operation/Redis/RPUSH | 29.951 | -17.632 |
| Datastore/operation/Redis/HGET | 6.600 | -10.324 |
| Java/other | 23.758 | -8.868 |
| Datastore/statement/Postgres/.../select | 19.414 | -6.124 |
| Datastore/operation/Redis/HEXISTS | 1.196 | -2.473 |
| Datastore/operation/Redis/GET | 0.000 | -2.529 |
| Datastore/statement/MongoDB/...Event/insert | 0.399 | -1.712 |
| Datastore/operation/Redis/HSCAN | 0.399 | -1.712 |
| Java/io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl/dispatch | 18.740 | 46.605 |
| Java/sun.net.www.protocol.http.HttpURLConnection/getResponseCode | 13.184 | 42.346 |
| Java/sun.net.www.protocol.http.HttpURLConnection/getOutputStream | 12.194 | 35.980 |
| Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor/handleMessage | 14.187 | 34.856 |
| Java/sun.net.www.protocol.http.HttpURLConnection/getInputStream | 10.981 | 33.421 |
| Java/com.zaxxer.hikari.HikariDataSource/getConnection | 9.171 | 28.940 |
| Datastore/operation/Redis/HSET | 34.804 | 22.405 |
| Datastore/operation/Redis/DEL | 33.911 | 9.934 |
| Java/io.lettuce.core.RedisAsyncCommandsImpl/dispatch | 0.787 | 6.850 |
| Datastore/statement/Postgres/.../select | 0.598 | 3.483 |
| Datastore/operation/Redis/LRANGE | 2.570 | 2.980 |
| Java/org.apache.cxf.interceptor.StaxOutEndingInterceptor/handleFault | 0.094 | 2.127 |
| Datastore/statement/MongoDB/...Event/insert | 0.000 | 1.712 |
| External/.../HttpURLConnection/getResponseCode | 17.495 | 1.428 |
| Datastore/operation/Redis/HDEL | 1.001 | 0.883 |
| Datastore/operation/Redis/LINDEX | 8.631 | 0.710 |
| Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor$SoapOutEndingInterceptor/handleFault | 0.024 | 0.364 |
| Datastore/statement/Postgres/.../select | 19.419 | 0.253 |
| Filter/org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter/doFilter | 0.000 | 0.000 |
| Java/org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor/handleFault | 0.374 | 0.000 |
| Filter/org.springframework.security.web.authentication.www.BasicAuthenticationFilter/doFilter | 0.000 | 0.000 |
| Filter/org.springframework.boot.web.servlet.DelegatingFilterProxyRegistrationBean$1/doFilter | 0.000 | 0.000 |
| Filter/org.springframework.boot.web.servlet.filter.OrderedCharacterEncodingFilter/doFilter | 0.000 | 0.000 |
| Filter/org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter/doFilter | 0.000 | 0.000 |
| Java/org.apache.cxf.binding.soap.interceptor.SoapHeaderOutFilterInterceptor/handleFault | 0.299 | 0.000 |
| Java/org.apache.cxf.binding.soap.interceptor.SoapPreProtocolOutInterceptor/handleFault | 0.000 | 0.000 |
| Filter/org.springframework.boot.web.servlet.filter.OrderedFormContentFilter/doFilter | 0.000 | 0.000 |
| Filter/org.springframework.security.web.session.DisableEncodeUrlFilter/doFilter | 0.000 | 0.000 |
| Java/sun.net.www.protocol.https.DelegateHttpsURLConnection/getResponseMessage | 0.000 | 0.000 |
| Java/org.apache.cxf.jaxws.interceptors.WrapperClassOutInterceptor/handleFault | 0.075 | 0.000 |
| Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor/handleFault | 0.002 | 0.016 |
| Filter/org.springframework.security.web.header.HeaderWriterFilter/doFilter | 0.000 | 0.000 |
| Java/sun.net.www.protocol.https.DelegateHttpsURLConnection/getInputStream | 0.000 | 0.000 |
| Java/org.glassfish.jersey.server.ServerRuntime$Responder/processResponse | 0.000 | 0.000 |
| Java/org.apache.cxf.jaxws.interceptors.HolderOutInterceptor/handleFault | 0.005 | 0.000 |
| Java/org.apache.cxf.wsdl.interceptors.BareOutInterceptor/handleFault | 0.006 | 0.068 |
| Java/org.apache.cxf.jaxws.interceptors.SwAOutInterceptor/handleFault | 0.019 | 0.000 |
| Java/org.apache.cxf.interceptor.MessageSenderInterceptor/handleFault | 0.000 | 0.000 |
| Java/org.apache.cxf.interceptor.AttachmentOutInterceptor/handleFault | 0.000 | 0.001 |
| Filter/org.springframework.web.filter.RequestContextFilter/doFilter | 0.000 | 0.000 |
| Java/org.apache.cxf.ws.policy.PolicyOutInterceptor/handleFault | 0.001 | 0.000 |
| Java/org.apache.cxf.interceptor.StaxOutInterceptor/handleFault | 0.001 | 0.008 |
| Servlet/org.glassfish.jersey.servlet.ServletContainer/service | 0.000 | 0.000 |
| Java/javax.servlet.ServletRequestListener/requestInitialized | 0.000 | 0.000 |
| Filter/org.apache.tomcat.websocket.server.WsFilter/doFilter | 0.000 | 0.000 |
| Java/org.glassfish.jersey.server.ApplicationHandler/handle | 0.000 | 0.000 |
| Filter/org.springframework.web.filter.CorsFilter/doFilter | 0.000 | 0.000 |
| Java/com...LoggingOutputInterceptor/handleFault | 0.000 | 0.000 |
| External/.../HttpURLConnection/getResponseCode | 0.000 | 0.000 |
| Filter/com...RequestIdentifierFilter/doFilter | 0.000 | 0.000 |
| Datastore/statement/MongoDB/...Event/insert | 0.399 | 0.000 |
| Datastore/statement/Postgres/.../select | 0.598 | 0.003 |
| Datastore/statement/Postgres/.../select | 0.598 | 0.001 |
| Datastore/operation/Redis/PEXPIREAT | 0.000 | 0.000 |
| External/.../CommonsHttp/execute | 0.000 | 0.000 |
| External/.../CommonsHttp/execute | 0.000 | 0.000 |
| Datastore/operation/Redis/INCRBY | 0.000 | 0.000 |
| Datastore/operation/Redis/SETEX | 0.000 | 0.000 |
| Datastore/operation/Redis/HKEYS | 0.000 | 0.000 |
| Java/com...RestImpl/search... | 0.000 | 0.000 |
| Java/com...Service/search... | 0.000 | 0.000 |
| Java/com...RestClient/get... | 0.000 | 0.000 |
| External/...Client/POST | 0.000 | 0.000 |
For a counter example, here's another transaction, that also uses Redis and lettuce, but never gives negative values:
External/.../CommonsHttp/execute 3.66 174.81 ms
Java/com...RestImpl/get... 1.00 49.51 ms
Datastore/operation/Redis/PEXPIRE 39.87 20.45 ms
Java/org.glassfish.jersey.server.ApplicationHandler/handle 0.80 5.95 ms
Java/other 63.98 3.21 ms
Datastore/operation/Redis/HGET 7.00 2.95 ms
Datastore/operation/Redis/HSET 1.95 1.02 ms
Java/io.lettuce.core.RedisAsyncCommandsImpl/dispatch 0.00 28.26 µs
Java/io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl/dispatch 0.02 21.83 µs
Java/org.apache.http.impl.client.InternalHttpClient/execute 0.00 7.64 µs
Java/javax.servlet.ServletRequestListener/requestInitialized 0.00 5.63 µs
Java/org.glassfish.jersey.server.ServerRuntime$Responder/processResponse 0.00 2.46 µs
Filter/com...RequestIdentifierFilter/doFilter 0.00 1.63 µs
Filter/org.springframework.security.web.authentication.www.BasicAuthenticationFilter/doFilter 0.00 684.94 ns
Servlet/org.glassfish.jersey.servlet.ServletContainer/service 0.00 440.40 ns
Filter/org.springframework.boot.web.servlet.filter.OrderedCharacterEncodingFilter/doFilter 0.00 110.96 ns
Filter/org.springframework.web.filter.RequestContextFilter/doFilter 0.00 103.87 ns
Filter/org.springframework.web.filter.CorsFilter/doFilter 0.00 41.10 ns
Filter/org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter/doFilter 0.00 28.14 ns
@jtduffy Is this info helpful?
It's more data, but unfortunately I don't think it gets us closer to a solid repro app. This has been a sporadic issue for years and we've never been able to track it down. It just seems like there's no common denominator across applications when it does pop up. Thanks again for the report and the information you provided.
@KGOH This is probably a long shot, but if the same transaction is still regularly generating these negative/large metric values every 15 minutes, can you go into the Metric Explorer and check if there are any metrics prefixed with Supportability/API/LegacyAsync/
?
If there are, try adding this legacy_async_api_skip_suspend: true
config to your agent config yaml (indented 2 spaces under the pre-existing common
stanza):
common: &default_settings
legacy_async_api_skip_suspend: true
I believe that the agent should detect this yaml change and be able to apply it dynamically without requiring a JVM restart, which usually only takes a few minutes. After 15-30 minutes passes do you still see the negative/large metric values associated with the problematic transaction?
Again, this is a long shot, but based on some of the metrics for the problematic transaction I'm wondering if the async servlet/Jetty continuations APIs are being used under the hood and if this might be related to our legacy async API that traces such activity.
@KGOH apologies but we don't have enough information to investigate this further. If you later identify a pattern or other details that might be helpful for us to reproduce this issue, do let us know.
Description
I receive stats with bad data from a single java app within the same transaction and the whole system metrics as a result. I haven't encountered such behavior on other java apps with the same agent.
callCount
and min/max call time may be incorrect too though they don't look like such in this example:Expected Behavior
I expect:
totalCallTime
andtotalExclusiveTime
.sumOfSquares <= maxCallTime*maxCallTime*callCount
Steps to Reproduce
I have no idea how to reproduce this. It's just a random behavior from a single app .
Your Environment
java agent version 8.11.0, OpenJDK 17.0.8
Additional context
Looks like my problem is related to this issue https://github.com/newrelic/newrelic-java-agent/issues/709 And this PR https://github.com/newrelic/newrelic-java-agent/pull/1685/files