sofastack / sofa-rpc

SOFARPC is a high-performance, high-extensibility, production-level Java RPC framework.
https://www.sofastack.tech/sofa-rpc/docs/Home
Apache License 2.0
3.81k stars 1.17k forks source link

GC overhead limit exceeded #296

Closed sohoku closed 5 years ago

sohoku commented 6 years ago

Describe the bug

在10s内发送4000个请求,出现了无法访问的情况,服务方出现了GC,客户端timeout ====================服务端GC log================ 20180906 13:48:14.669 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxn -Client session timed out, have not heard from server in 26907ms for sessionid 0x165a87f43bc002c, closing socket connection and attempting reconnect 20180906 13:48:18.297 [main-EventThread] o.a.c.f.s.ConnectionStateManager -State change: SUSPENDED 20180906 13:48:20.549 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxn -Session 0x165a87f43bc002c for server null, unexpected error, closing socket connection and attempting reconnect java.lang.OutOfMemoryError: GC overhead limit exceeded 20180906 13:48:22.908 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxn -Opening socket connection to server 172.17.5.200/172.17.5.200:2181. Will not attempt to authenticate using SASL (unknown error) 20180906 13:48:22.909 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxn -Socket connection established to 172.17.5.200/172.17.5.200:2181, initiating session 20180906 13:48:23.137 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxnSocket -Connected to an old server; r-o mode will be unavailable 20180906 13:48:23.137 [main-SendThread(172.17.5.200:2181)] o.a.z.ClientCnxn -Session establishment complete on server 172.17.5.200/172.17.5.200:2181, sessionid = 0x165a87f43bc002c, negotiated timeout = 40000 20180906 13:48:23.137 [main-EventThread] o.a.c.f.s.ConnectionStateManager -State change: RECONNECTED 20180906 13:48:24.288 [Curator-Framework-0] o.a.c.f.i.CuratorFrameworkImpl -Background exception was not retry-able or retry gave up java.lang.OutOfMemoryError: GC overhead limit exceeded

===================客户端log================= com.alipay.sofa.rpc.core.exception.SofaTimeOutException: com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT]! the address is 192.168.1.12:12201 at com.alipay.sofa.rpc.transport.bolt.BoltClientTransport.convertToRpcException(BoltClientTransport.java:341) at com.alipay.sofa.rpc.transport.bolt.BoltClientTransport.syncSend(BoltClientTransport.java:265) at com.alipay.sofa.rpc.client.AbstractCluster.doSendMsg(AbstractCluster.java:509) at com.alipay.sofa.rpc.client.AbstractCluster.sendMsg(AbstractCluster.java:480) at com.alipay.sofa.rpc.filter.ConsumerInvoker.invoke(ConsumerInvoker.java:60) at com.alipay.sofa.rpc.filter.sofatracer.ConsumerTracerFilter.invoke(ConsumerTracerFilter.java:66) at com.alipay.sofa.rpc.filter.FilterInvoker.invoke(FilterInvoker.java:96) at com.alipay.sofa.rpc.filter.RpcReferenceContextFilter.invoke(RpcReferenceContextFilter.java:80) at com.alipay.sofa.rpc.filter.FilterInvoker.invoke(FilterInvoker.java:96) at com.alipay.sofa.rpc.filter.ConsumerExceptionFilter.invoke(ConsumerExceptionFilter.java:37) at com.alipay.sofa.rpc.filter.FilterInvoker.invoke(FilterInvoker.java:96) at com.alipay.sofa.rpc.filter.FilterChain.invoke(FilterChain.java:302) at com.alipay.sofa.rpc.client.AbstractCluster.filterChain(AbstractCluster.java:473) at com.alipay.sofa.rpc.client.FailoverCluster.doInvoke(FailoverCluster.java:66) at com.alipay.sofa.rpc.client.AbstractCluster.invoke(AbstractCluster.java:285) at com.alipay.sofa.rpc.client.ClientProxyInvoker.invoke(ClientProxyInvoker.java:83)

Expected behavior

Actual behavior

Steps to reproduce

Minimal yet complete reproducer code (or GitHub URL to code)

Environment

tks!

ujjboy commented 6 years ago

java.lang.OutOfMemoryError: GC overhead limit exceeded

可以jmap分析下堆内内容。如果不是内存泄露的话,一般是由于是堆太小导致的吧,改大点试试。

sohoku commented 6 years ago

谢谢您的回复,因为是在开发环境的台式机(内存8G,i5cpu)上使用,目前将堆大小调整到512M,还是会出现此情况,在压测过程中,cpu和内存占用并无明显的增长,使用jmap dump出来的内容有684m,内容前半部分基本是正常的对象,而后半部分出现大量的乱码字符: 类似如下----------- ^@õ6æØ^@^@^@^A^@^@^@^@à^Z<82>p^@^@^@^D^@^@^@ö!^@^@^@^@õ6æè^@^@^@^A^@^@^@^@à^ZµÈ^@^@^@^\^@^@^@^@^@^@^@^@^@^@^@^@õ6æX^@^@^@^@õ6æØ^@^@^@ö!^@^@^@^@õ6ç^H^@^@^@^A^@^@^@^@ål¸X^@^@^@8^@^@^@^@õ5ò ^@^@^@^@ õ6ç0^@^@^@^@õ6çH^@^

leizhiyuan commented 6 years ago

jmap 中有这种字符?

sohoku commented 6 years ago

是的,还很多这种字符,使用的是crt,utf8编码查看的,确认文件前半部分都是正常的对象名称等字符,后面就是有大量的乱码,另附上相关配置: com: alipay: sofa: rpc: registry-address: zookeeper://192.168.1.12:2181 bolt-port: 12201 bolt-thread-pool-core-size: 30 bolt-thread-pool-max-size: 1000 bolt-thread-pool-queue-size: 2000 bolt-accepts-size: 1000

virtual-host: 0.0.0.0

    #        virtual-port: 12221
    aft-time-window: 20
    aft-least-window-count: 30
    aft-least-window-exception-rate-multiple: 6
    aft-regulation-effective: false
    aft-degrade-effective: false
    aft-weight-degrade-rate: 0.5
    aft-weight-recover-rate: 1.2
    aft-degrade-least-weight: 1
    aft-degrade-max-ip-count: 2
NeGnail commented 6 years ago

@sohoku
aft-regulation-effective: false aft-degrade-effective: false

这两个配置如果为false,这个功能是不会生效的。不过你的问题应该也和这个功能没有关系。

sohoku commented 6 years ago

@NeGnail 恩,这2个配置是后面我改成false的,因为调失败后,一直就调不到服务端了,就试着去掉降级策略看看能不能强制调用到,实测跟问题没有任何关系