pinpoint-apm / pinpoint

APM, (Application Performance Management) tool for large-scale distributed systems.
https://pinpoint-apm.gitbook.io/
Apache License 2.0
13.4k stars 3.75k forks source link

Methods in certain traces are given out as - API-METADATA-NOT-FOUND #6816

Closed varun-krishna closed 4 years ago

varun-krishna commented 4 years ago

Version - pinpoint-collector-2.0.2

We are using the GRPC ports, Certain methods in trace are given out as API-METADATA-NOT-FOUND.

@koo-taejin @emeroad We are aiming to get pinpoint to production in a weeks time would really help if some light is thrown on this issue.

emeroad commented 4 years ago

I recommend 2.0.2 update. Data retry fail bug has been fixed in 2.0.2

API-META-DATA uses 9991 port Check status of 9991 port

varun-krishna commented 4 years ago

We are using version 2.0.2. The agent is running on Google Kubernates Engine. We are using a TCP load balancer exposing ports 9991,9992,9993. Logs indicate the following :

(c.n.p.p.s.g.MetadataGrpcDataSender ) Error. request=apiId: 48 apiInfo: "com.google.gson.Gson.toJson(java.lang.Object src, java.lang.Appendable writer)" line: 657, cause=UNAVAILABLE: io exception

io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2020-05-29 22:11:22 INFO Error. request=apiId: 49 apiInfo: "com.google.gson.Gson.toJson(java.lang.Object src, java.lang.reflect.Type typeOfSrc, java.lang.Appendable writer)" line: 682, cause=UNAVAILABLE: io exception io.grpc.StatusRuntimeException: UNAVAILABLE: io exception.

Please find the screenshot below :

Screenshot 2020-06-01 at 11 08 31 AM

We are moving to prod with the setup , but if we are not able to solve this problem we might have to look at other options. We have spent a lot of effort solving this problem. It would be great if @emeroad @koo-taejin you could help us. A clear picture on this status would really help us.

dinesh4747 commented 4 years ago

I did same problem and finally changed from gRPC to Thrift with all TCP ports (9994/9995/9996 - tcp) and but still seeing the similar exception as below

/app/pinpoint-agent-2.0.2/logs # netstat -atulnp | grep 999 tcp 0 0 10.45.180.48:49998 10.45.67.209:15004 ESTABLISHED - tcp 0 0 10.45.180.48:59990 10.45.197.233:15004 ESTABLISHED - tcp 0 0 10.45.180.48:56032 10.148.6.169:9994 ESTABLISHED - tcp 0 0 10.45.180.48:59996 10.45.153.33:15004 ESTABLISHED - tcp 0 0 10.45.180.48:45742 10.148.6.169:9996 ESTABLISHED - tcp 0 0 10.45.180.48:44162 10.148.6.169:9995 ESTABLISHED - tcp 0 0 ::ffff:10.45.180.48:44160 ::ffff:10.148.6.169:9995 ESTABLISHED 20/java tcp 0 0 ::ffff:10.45.180.48:56030 ::ffff:10.148.6.169:9994 ESTABLISHED 20/java tcp 0 0 ::ffff:10.45.180.48:45740 ::ffff:10.148.6.169:9996 ESTABLISHED 20/java

Agent logs - (application is running on container in google kubernetes engine/GKE and collector is deployed in google compute engine/GCE as a stateful VM for stability reasons)

2020-06-01 19:59:45 WARN DefaultPinpointClientHandler@2755d705 exceptionCaught() occurred. state:NO java.net.ConnectException: Connection refused: /10.148.6.169:9996 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

2020-06-01 19:59:45 INFO DefaultPinpointClientHandler@3301500b channelClosed() started. 2020-06-01 19:59:45 WARN tcp connect fail. remote:10.148.6.169/9994 try reconnect, retryCount:2 2020-06-01 19:59:45 INFO DefaultPinpointClientHandler@3301500b stateTo() completed. Socket state ch 2020-06-01 19:59:45 WARN change background tcp connect mode remote:10.148.6.169/9994 2020-06-01 19:59:45 INFO PinpointClientHandshaker@7561db12 handshakeAbort() started. 2020-06-01 19:59:45 INFO PinpointClientHandshaker@7561db12 unexpected state 2020-06-01 19:59:45 INFO Pinpoint-TcpDataSender(Default)-Executor(15-0) started. 2020-06-01 19:59:45 INFO new BasicTraceSampler() 2020-06-01 19:59:45 INFO request fail. request:com.navercorp.pinpoint.profiler.metadata.ApiMetaData com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:284) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:240) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.sendPacket(TcpDataSender.java:211) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$1.execute(TcpDataSender.java:118) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:164) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:94) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.run(AsyncQueueingExecutor.java:77) at java.lang.Thread.run(Thread.java:748)

2020-06-01 19:59:55 WARN discard retry message(RetryMessage{retryCount=3, maxRetryCount=3, bytes=11 2020-06-01 19:59:55 INFO request fail. request:RetryMessage{retryCount=3, maxRetryCount=3, bytes=18 com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:318) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.access$500(TcpDataSender.java:55) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$4.run(TcpDataSender.java:343) at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at java.lang.Thread.run(Thread.java:748) 2020-06-01 19:59:55 WARN discard retry message(RetryMessage{retryCount=3, maxRetryCount=3, bytes=18 2020-06-01 19:59:55 INFO request fail. request:RetryMessage{retryCount=3, maxRetryCount=3, bytes=55 com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:318) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.access$500(TcpDataSender.java:55) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$4.run(TcpDataSender.java:343) at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at java.lang.Thread.run(Thread.java:748) 2020-06-01 19:59:55 WARN discard retry message(RetryMessage{retryCount=3, maxRetryCount=3, bytes=55 2020-06-01 19:59:55 INFO request fail. request:RetryMessage{retryCount=3, maxRetryCount=3, bytes=16 com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting...

dinesh4747 commented 4 years ago

@yjqg6666

yjqg6666 commented 4 years ago

DefaultPinpointClientHandler@2755d705 exceptionCaught() occurred. state:NO java.net.ConnectException: Connection refused: /10.148.6.169:9996

dinesh4747 commented 4 years ago

we have observed this issue where Pinpoint agent periodically reports connection refused during the startup and then reconnects in the span of 6 secs

Environment details

How agent id is generated for all the pods:

Pinpoint agent unique config

AGENT_ID_PREFIX="app" AGENT_ID_SUFFIX="$(echo $HOSTNAME | rev | cut -d'-' -f-2 | rev)" AGENT_ID="$AGENT_ID_PREFIX-$AGENT_ID_SUFFIX" echo "Agent ID for the container is ${AGENT_ID}"

Observation/Summary:

  1. For one of the application we are spawning close to 45 pods as part of replica-set - In this case we see 25 pods gives me the right call stack/span all in ideal condition. But remaining 20 pods were apparently showing ambiguous "API_META_DATA_NOT_FOUND" across the call stack

  2. While we looked at problematic pods which had meta data issue we see the connection was refused and it goes for almost 6 seconds of time interval post that It was connecting fine across all the TCP ports (9994/9995/9996)

  3. This also happens when we carry out auto-scale wherein new pods comes as part of auto-scaling does possess the same "API_META_DATA_NOT_FOUND"

  4. Looking at the netstat everything looks connected after this periodic exception /app/pinpoint-agent-2.0.2/logs # netstat -atunlp | grep 9994 tcp 0 0 10.45.80.202:51748 10.148.6.2:9994 ESTABLISHED - tcp 0 0 ::ffff:10.45.80.202:51746 ::ffff:10.148.6.2:9994 ESTABLISHED 20/java /app/pinpoint-agent-2.0.2/logs # netstat -atunlp | grep 9995 tcp 0 0 10.45.80.202:53954 10.148.6.2:9995 ESTABLISHED - tcp 0 0 ::ffff:10.45.80.202:53952 ::ffff:10.148.6.2:9995 ESTABLISHED 20/java /app/pinpoint-agent-2.0.2/logs # netstat -atunlp | grep 9996 tcp 0 0 10.45.80.202:38854 10.148.6.2:9996 ESTABLISHED - tcp 0 0 ::ffff:10.45.80.202:38852 ::ffff:10.148.6.2:9996 ESTABLISHED 20/java

  5. However in an event of this connection refused failure during the agent startup - we were getting lost on Apidata sync and post to that even though it establishes the connection/success It couldn't able to piggy back/retry the metadata for the traces - This appears to be the biggest case of concern

Below were the sample logs from the meta data thrown pods wherein spotted the timestamp which would clearly infer that it failed to connect initially (in this case 9995) and keeps on reconnecting and finally connect/success at 6th second

2020-06-02 17:30:12 WARN DefaultPinpointClientHandler@545f80bf exceptionCaught() occurred. state:NONE. Caused:Connection refused: /10.148.6.2:9995 java.net.ConnectException: Connection refused: /10.148.6.2:9995 2020-06-02 17:30:12 WARN tcp connect fail. remote:10.148.6.2/9995 try reconnect, retryCount:0 2020-06-02 17:30:12 WARN DefaultPinpointClientHandler@22fa55b2 exceptionCaught() occurred. state:NONE. Caused:Connection refused: /10.148.6.2:9995 java.net.ConnectException: Connection refused: /10.148.6.2:9995 2020-06-02 17:30:12 WARN tcp connect fail. remote:10.148.6.2/9995 try reconnect, retryCount:1 2020-06-02 17:30:12 WARN DefaultPinpointClientHandler@6594402a exceptionCaught() occurred. state:NONE. Caused:Connection refused: /10.148.6.2:9995 java.net.ConnectException: Connection refused: /10.148.6.2:9995 2020-06-02 17:30:12 WARN tcp connect fail. remote:10.148.6.2/9995 try reconnect, retryCount:2 2020-06-02 17:30:12 WARN change background tcp connect mode remote:10.148.6.2/9995 2020-06-02 17:30:15 WARN try reconnect. connectAddress:DnsSocketAddressProvider{host='10.148.6.2', port=9995} 2020-06-02 17:30:15 INFO DefaultPinpointClientHandler@70d31ca8 exceptionCaught() occurred. state:BEING_CONNECT, caused:Connection refused: /10.148.6.2:9995. 2020-06-02 17:30:18 WARN try reconnect. connectAddress:DnsSocketAddressProvider{host='10.148.6.2', port=9995} 2020-06-02 17:30:18 INFO reconnect success DnsSocketAddressProvider{host='10.148.6.2', port=9995}, [id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995] 2020-06-02 17:30:18 WARN reconnectClientHandler:DefaultPinpointClientHandler@68473645{channel=[id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995]state=SocketState(NONE->BEING_CONNECT)} 2020-06-02 17:30:18 INFO DefaultPinpointClientHandler@68473645 channelConnected() started. channel:[id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995] 2020-06-02 17:30:18 INFO PinpointClientHandshaker@31779e2b handshakeStart() started. channel:[id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995] 2020-06-02 17:30:18 INFO PinpointClientHandshaker@31779e2b do handshake(1/2147483647). channel:[id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995]. 2020-06-02 17:30:18 INFO PinpointClientHandshaker@31779e2b handshakeStart() completed. channel:[id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995], data:{socketId=5} 2020-06-02 17:30:18 INFO [id: 0xca4c25b4, /10.45.80.202:53952 => /10.148.6.2:9995] handleHandshakePacket() completed. code:SIMPLEX_COMMUNICATION

Kindly let me know whether it falls under

we are planning to move this product to production by Friday and this is becoming severe impediment/barrier to move ahead to go-live. I would be so grateful if someone could pay sincere attention to this above behaviour and guide us if we are missing something from my end

@emeroad @Xylus @koo-taejin @jaehong-kim @yjqg6666 @RoySRose

Between a whole hearted thanks and a special mention/kudos to @yjqg6666 and we are humbled with your extensive help in this regards.

Thanks in anticipation Dinesh

varun-krishna commented 4 years ago

@koo-taejin @emeroad @jaehong-kim @yjqg6666

i am seeing the exception in collector , looks like it has something to do with issue . Can you please throw some light

Screenshot 2020-06-03 at 11 06 02 AM
dinesh4747 commented 4 years ago

Tried to get the output of netstat across all 3 collector instances (netstat -tnlp | grep 999)

All looks clean it's listening

tcp6 0 0 :::9991 ::: LISTEN 3630/java tcp6 0 0 :::9992 ::: LISTEN 3630/java tcp6 0 0 :::9993 ::: LISTEN 3630/java tcp6 0 0 :::9994 ::: LISTEN 3630/java tcp6 0 0 :::9995 ::: LISTEN 3630/java tcp6 0 0 :::9996 ::: LISTEN 3630/java

varun-krishna commented 4 years ago

additionally posting the pinpoint-conifg we are using , and would really help if we get clarified if the config is fine.

Screenshot 2020-06-03 at 11 55 00 AM
dinesh4747 commented 4 years ago

There were some inconsistencies around cluster enable so we have disabled cluster.enable=false and this stopped the exceptions from collector but agent was still throwing the below "Reconnecting"exceptions and subsequently "API_META_DATA_NOT_FOUND" still persists

2020-06-03 10:39:56 INFO request fail. request:com.navercorp.pinpoint.profiler.metadata.ApiMetaData com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:284) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:240) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.sendPacket(TcpDataSender.java:211) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$1.execute(TcpDataSender.java:118) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:164) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:94) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.run(AsyncQueueingExecutor.java:77) at java.lang.Thread.run(Thread.java:748) 2020-06-03 10:39:56 INFO request fail. request:com.navercorp.pinpoint.profiler.metadata.ApiMetaData com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:284) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:240) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.sendPacket(TcpDataSender.java:211) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$1.execute(TcpDataSender.java:118) at com.navercorp.pinpoint.profiler.sender.DefaultAsyncQueueingExecutorListener.execute(DefaultAsyncQueueingExecutorListener.java:42 at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:160) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:87) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.run(AsyncQueueingExecutor.java:77) at java.lang.Thread.run(Thread.java:748)

dinesh4747 commented 4 years ago

Netstat output

netstat -atunlp | grep 999

==================================================================== tcp 0 0 10.45.10.160:44758 10.148.6.2:9994 ESTABLISHED - tcp 0 0 10.45.10.160:36334 10.148.6.2:9996 ESTABLISHED - tcp 0 0 10.45.10.160:33392 10.148.6.2:9995 ESTABLISHED - tcp 0 0 ::ffff:10.45.10.160:44756 ::ffff:10.148.6.2:9994 ESTABLISHED 20/java tcp 0 0 ::ffff:10.45.10.160:36332 ::ffff:10.148.6.2:9996 ESTABLISHED 20/java tcp 0 0 ::ffff:10.45.10.160:33390 ::ffff:10.148.6.2:9995 ESTABLISHED 20/java

====================================================================

varun-krishna commented 4 years ago

@yjqg6666 getting connection refused exception at application logs :


020-06-03 10:48:33 WARN DefaultPinpointClientHandler@2755d705 exceptionCaught() occurred. state:NONE. Caused:Connection refused: /10.148.6.2:9996 java.net.ConnectException: Connection refused: /10.148.6.2:9996 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-06-03 10:48:33 WARN tcp connect fail. remote:10.148.6.2/9996 try reconnect, retryCount:0 2020-06-03 10:48:33 WARN DefaultPinpointClientHandler@287f94b1 exceptionCaught() occurred. state:NONE. Caused:Connection refused: /10.148.6.2:9996 java.net.ConnectException: Connection refused: /10.148.6.2:9996 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

======================================================================

2020-06-03 10:48:43 WARN discard retry message(RetryMessage{retryCount=3, maxRetryCount=3, bytes=166, messageDescription='ApiMetaData'}). 2020-06-03 10:48:43 WARN request failed. com.navercorp.pinpoint.rpc.PinpointSocketException: reconnecting... at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.newReconnectException(ReconnectStateClientHandler.java:85) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.reconnectFailureFuture(ReconnectStateClientHandler.java:71) at com.navercorp.pinpoint.rpc.client.ReconnectStateClientHandler.request(ReconnectStateClientHandler.java:90) at com.navercorp.pinpoint.rpc.client.DefaultPinpointClient.request(DefaultPinpointClient.java:121) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:351) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.doRequest(TcpDataSender.java:237) at com.navercorp.pinpoint.profiler.sender.TcpDataSender.sendPacket(TcpDataSender.java:211) at com.navercorp.pinpoint.profiler.sender.TcpDataSender$1.execute(TcpDataSender.java:118) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:164) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.doExecute(AsyncQueueingExecutor.java:94) at com.navercorp.pinpoint.profiler.sender.AsyncQueueingExecutor.run(AsyncQueueingExecutor.java:77) at java.lang.Thread.run(Thread.java:748)

yjqg6666 commented 4 years ago

Pls run telnet 10.148.6.2 9996 on your application host for connectivity.

stale[bot] commented 4 years ago

This issue/proposal has been automatically marked as stale because it hasn't had any recent activity. It will automatically be closed if no further activity occurs for 20days. If you think this should still be open, or the problem still persists, just pop a reply in the comments and one of the maintainers will (try!) to follow up. Thank you for your interest and contribution to the Pinpoint Community.