sermant-io / Sermant

CNCF sandbox project, a Cloud-Native Proxyless Service Mesh based on Java Bytecode Enhancement Technology
https://sermant.io/
Apache License 2.0
1.25k stars 164 forks source link

Backend Use Nacos cause OutOfMemoryError #1613

Closed AYue-94 closed 1 month ago

AYue-94 commented 2 months ago

What happened?

When backend loses contact with naocs, OOM occurs after a period of time:

Exception in thread "com.alibaba.nacos.client.Worker" java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:719)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
    at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Exception in thread "com.alibaba.nacos.client.Worker" java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:719)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
    at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
jstack 6718 | grep "com.alibaba.nacos.client.Worker" | wc -l
    4029

How can we reproduce it (as minimally and precisely as possible)?

  1. do not start nacos
  2. backend use nacos
    dynamic.config.enable=true
    dynamic.config.namespace=sermant
    dynamic.config.timeout=30000
    dynamic.config.serverAddress=127.0.0.1:8848
    dynamic.config.dynamicConfigType=NACOS
    dynamic.config.connectTimeout=3000
    dynamic.config.enableAuth=false
    dynamic.config.userName=
    dynamic.config.password=
    dynamic.config.secretKey=
  3. start backend

Anything else we need to know?

No response

Sermant version

2.0.0

OS version

MacOS

AYue-94 commented 2 months ago

We don't need to manage reconnection owerself when using the Nacos client, nacos will deal reconnect itself. image for agent, it is the same, nacos version after 2.2.1(https://github.com/alibaba/nacos/pull/9639), it add health check logic, it will cause memory/thread leak. image

AYue-94 commented 2 months ago

i will try to fix it, by remove nacos reconnect logic

lilai23 commented 2 months ago

How many time nacosclient manages reconnection?Is it configurable?

AYue-94 commented 2 months ago

The number of reconnections for nacos client cannot be configured, unlimited retries. See https://github.com/alibaba/nacos/blob/2.2.1/common/src/main/java/com/alibaba/nacos/common/remote/client/RpcClient.java#L280