nacos-group / nacos-examples

Nacos Examples
Apache License 2.0
950 stars 1.47k forks source link

nacos 1.4.1版本,在k8s里面,用一段时间就断开注册中心 #61

Open bingju328 opened 3 years ago

bingju328 commented 3 years ago

问题描述

  1. 没有并发
  2. 没有大文件传输
  3. 网络会不稳定

    当前的情况

    过一两天就会断开连接,需要手动重启java服务,nacos服务一直正常运行,也没有日志

    期望

    可以配置,即使断开了,等正常的时候自动注册上去,就不用每次都重启应用了。

服务启动后注册成功,过一段时间,java应用就断开,在注册中心也看不到注册的服务了,然后必须重启java应用才会重新注册,java应用日志报错如下:

[mogu-admin:10.244.3.112:8080] [,] 2021-06-05 04:10:48.780 ERROR 1 [com.alibaba.nacos.client.Worker.longPolling.fixed-nacos.gulimall.svc.cluster.local_8848] com.alibaba.nacos.client.config.impl.ClientWorker [fixed-nacos.gulimall.svc.cluster.local_8848] [check-update] get changed dataId exception

java.net.UnknownHostException: nacos.gulimall.svc.cluster.local
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
    at com.alibaba.nacos.common.http.client.request.JdkHttpClientRequest.execute(JdkHttpClientRequest.java:106)
    at com.alibaba.nacos.common.http.client.InterceptingHttpClientRequest.execute(InterceptingHttpClientRequest.java:53)
    at com.alibaba.nacos.common.http.client.NacosRestTemplate.execute(NacosRestTemplate.java:482)
    at com.alibaba.nacos.common.http.client.NacosRestTemplate.postForm(NacosRestTemplate.java:407)
    at com.alibaba.nacos.client.config.http.ServerHttpAgent.httpPost(ServerHttpAgent.java:155)
    at com.alibaba.nacos.client.config.http.MetricsHttpAgent.httpPost(MetricsHttpAgent.java:68)
    at com.alibaba.nacos.client.config.impl.ClientWorker.checkUpdateConfigStr(ClientWorker.java:441)
    at com.alibaba.nacos.client.config.impl.ClientWorker.checkUpdateDataIds(ClientWorker.java:408)
    at com.alibaba.nacos.client.config.impl.ClientWorker$LongPollingRunnable.run(ClientWorker.java:596)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

环境

spring配置

spring:
  application:
    name: mogu_web
  cloud:
    nacos:
      discovery:
        server-addr: ${nacos_config_discovery}
        heart-beat-timeout: 30000
        #        server-addr: nacos.gulimall.svc.cluster.local:8848
      config:
        server-addr: ${nacos_config}
        timeout: 30000
        #        server-addr: nacos.gulimall.svc.cluster.local:8848

其他配置

//k8s
1.15
//spring boot
<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.2.2.RELEASE</version>
        <relativePath/>
</parent>
//spring cloud alibaba
<parent>
    <groupId>com.alibaba.cloud</groupId>
    <artifactId>spring-cloud-alibaba-starters</artifactId>
    <version>2.2.4.RELEASE</version>
</parent>
//nacos
<dependency>
      <groupId>com.alibaba.nacos</groupId>
      <artifactId>nacos-client</artifactId>
      <version>1.4.1</version>
      <scope>compile</scope>
</dependency>

问题排查

  1. 先查了nacos日志,没有任何日志。
  2. coredns 当时的日志
    2021-06-04T23:23:39.938Z [ERROR] plugin/errors: 2 maya-apiserver-576587998d-7vpsj. AAAA: read udp 10.244.1.191:56724->192.168.8.1:53: i/o timeout
    2021-06-04T23:25:39.971Z [ERROR] plugin/errors: 2 maya-apiserver-576587998d-7vpsj. AAAA: read udp 10.244.1.191:57593->192.168.8.1:53: i/o timeout
  3. 然后dig cluster 发现正常,如下
    
    root@hp:/# dig -t A nacos.gulimall.svc.cluster.local. @10.244.0.176
    ; <<>> DiG 9.11.3-1ubuntu1.15-Ubuntu <<>> -t A nacos.gulimall.svc.cluster.local. @10.244.0.176
    ;; global options: +cmd
    ;; Got answer:
    ;; WARNING: .local is reserved for Multicast DNS
    ;; You are currently testing what happens when an mDNS query is leaked to DNS
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49827
    ;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    ;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 8a69c8768d24fd61 (echoed) ;; QUESTION SECTION: ;nacos.gulimall.svc.cluster.local. IN A

;; ANSWER SECTION: nacos.gulimall.svc.cluster.local. 24 IN A 10.97.100.187

;; Query time: 0 msec ;; SERVER: 10.244.0.176#53(10.244.0.176) ;; WHEN: Sat Jun 05 12:42:51 CST 2021 ;; MSG SIZE rcvd: 121