moby / libnetwork

networking for containers
Apache License 2.0
2.16k stars 882 forks source link

[resolver] invalid concurrent query count is misleading #2546

Open xenuser opened 4 years ago

xenuser commented 4 years ago

During runtime, we sometimes see the following error message in journalctl:

dockerd: time="2020-04-16T13:25:59.896906044+02:00" level=error msg="[resolver] invalid concurrent query count"

This leads to the assumption, that something with an internal resolver is wrong and there is a DNS issue. In fact, when checking the affected node and all containers running on that node when this error occurs, we can not detect any DNS issue at all.

Tested by: Running DNS queries inside the container and on the host directly.

I tried understanding the code where this error message is coming from: https://github.com/moby/libnetwork/blob/master/resolver.go

For me, it looks like as soon as a query was processed, a counter for concurrent DNS queries is decreased by one:

func (r *resolver) forwardQueryEnd() {
    r.queryLock.Lock()
    defer r.queryLock.Unlock()

    if r.count == 0 {
        logrus.Error("[resolver] invalid concurrent query count")
    } else {
        r.count--
    }
}

And if the concurrent query count is already 0, the counter can not be decreased any more, hence an error is thrown. In my understanding, this has no effect on the function of the resolver, the queries still work. So throwing an error that there is an invalid concurrent query count is misleading. It leads to the assumption that something is not working right, although it just means that there are no more queries left to be processed.

Can someone please confirm that I am not totally wrong here? If I'm right, I would like to suggest that the error message is adapted slightly, helping sys admins to judge the "error" better. Or maybe switch the level from "error" to "info" or not log something at all.

docker info:


Client:
 Debug Mode: false

Server:
 Containers: 29
  Running: 27
  Paused: 0
  Stopped: 2
 Images: 845
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local local-persist
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID:
  Is Manager: true

  Orchestration:
   Task History Retention Limit: 5

  Autolock Managers: false

 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1062.9.1.el7.x86_64
 Operating System: Red Hat Enterprise Linux Server 7.7 (Maipo)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 31.26GiB
 Debug Mode: false

 Experimental: true
stefanmonkey commented 2 years ago

Same problem +1

system log output

dockerd: time="2022-07-31T19:27:53.443961512+08:00" level=error msg="[resolver] invalid concurrent query count"

and in docker-compose environment, harbor core error log:

Jul 31 22:27:00 192.168.208.1 core[18652]: 2022-07-31T14:27:00Z [WARNING] [/common/utils/oidc/helper.go:63]: Failed to refresh configuration, error: failed to load OIDC setting: dial tcp: lookup <pgDomain>: Temporary failure in name resolution
Jul 31 22:27:04 192.168.208.1 core[18652]: 2022/07/31 14:27:04 #033[1;31m[E] [server.go:2774] dial tcp: lookup <redisDomain>: Temporary failure in name resolution#033[0m
Jul 31 22:27:10 192.168.208.1 core[18652]: 2022-07-31T14:27:10Z [WARNING] [/common/utils/oidc/helper.go:63]: Failed to refresh configuration, error: failed to load OIDC setting: dial tcp: lookup <pgDomain>: Temporary failure in name resolution
Jul 31 22:27:14 192.168.208.1 core[18652]: 2022/07/31 14:27:14 #033[1;31m[E] [server.go:2774] dial tcp: lookup <redisDomain>: Temporary failure in name resolution#033[0m
Jul 31 22:27:20 192.168.208.1 core[18652]: 2022-07-31T14:27:20Z [WARNING] [/common/utils/oidc/helper.go:63]: Failed to refresh configuration, error: failed to load OIDC setting: dial tcp: lookup <pgDomain>: Temporary failure in name resolution

docker info output:

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 11
  Running: 10
  Paused: 0
  Stopped: 1
 Images: 39
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-957.21.3.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.638GiB
 Name: ctnr.a254-48-61.prod.ali-bj-c
 ID: L7DR:JNFK:XVH6:CIK3:2VJC:GBVO:RLZY:OWKD:XQX7:O5NS:PV4X:XVTA
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://<xxxxxx.mirror.aliyuncs.com>/
 Live Restore Enabled: false

After restart dockerd service and restart docker-compose harbor-service-list, resolver error disappear