moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.52k stars 18.63k forks source link

[fix included] dockerd[753]: http: panic serving @: runtime error: invalid memory address or nil pointer dereference #43034

Open CRCinAU opened 2 years ago

CRCinAU commented 2 years ago

Description

Describe the results you received:

Nov 19 02:01:03  systemd[1]: Started Start docker-compose on boot.
Nov 19 02:01:04  docker-compose[3379]: Creating network "docker_default" with the default driver
Nov 19 02:01:04  dockerd[753]: http: panic serving @: runtime error: invalid memory address or nil pointer dereference
                                      goroutine 794 [running]:
                                      net/http.(*conn).serve.func1(0x4000bff2c0)
                                              /usr/local/go/src/net/http/server.go:1804 +0x108
                                      panic(0x5590289c40, 0x559158a3e0)
                                              /usr/local/go/src/runtime/panic.go:971 +0x3f4
                                      github.com/docker/docker/vendor/github.com/vishvananda/netlink.parseAddr(0x4000276864, 0x40, 0x40, 0x0, 0x40015a46ec, 0x4, 0x280, 0x0, 0x0, 0x4000276878, ...)
                                              /go/src/github.com/docker/docker/vendor/github.com/vishvananda/netlink/addr_linux.go:274 +0x174
                                      github.com/docker/docker/vendor/github.com/vishvananda/netlink.(*Handle).AddrList(0x40008b8260, 0x55906788f0, 0x4000e25d40, 0x2, 0x0, 0x0, 0x400013c8d0, 0x0, 0x4000de87e8)
                                              /go/src/github.com/docker/docker/vendor/github.com/vishvananda/netlink/addr_linux.go:199 +0x1a0
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.(*bridgeInterface).addresses(0x40006471a0, 0x0, 0x5590647cd8, 0x4000de0a30, 0x31, 0x40010c7140, 0x0, 0x40015a45f0, 0x4000de86b8)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge/interface.go:57 +0x44
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.setupBridgeIPv4(0x4000c7eea0, 0x40006471a0, 0x0, 0x0)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge/setup_ipv4.go:31 +0xac
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.(*bridgeSetup).apply(0x4000de8928, 0x4000eb0c60, 0x4)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge/setup.go:17 +0x74
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.(*driver).createNetwork(0x40003f2280, 0x4000c7eea0, 0x0, 0x0)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge/bridge.go:809 +0x6a8
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.(*driver).CreateNetwork(0x40003f2280, 0x40010c6f40, 0x40, 0x4000eeb7d0, 0x55906775e0, 0x4000961180, 0x4000eebef0, 0x1, 0x1, 0x5591667850, ...)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge/bridge.go:648 +0x3fc
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).addNetwork(0x400090e100, 0x4000961180, 0x0, 0x40003f2280)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/controller.go:1011 +0x100
                                      github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).NewNetwork(0x400090e100, 0x558fcb2091, 0x6, 0x40015a4100, 0xe, 0x40010c6f40, 0x40, 0x4000eeb6e0, 0x6, 0x6, ...)
                                              /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/controller.go:828 +0x92c
                                      github.com/docker/docker/daemon.(*Daemon).createNetwork(0x4000cb2000, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100, 0x0, ...)
                                              /go/src/github.com/docker/docker/daemon/network.go:365 +0x454
                                      github.com/docker/docker/daemon.(*Daemon).CreateNetwork(0x4000cb2000, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100, 0x0, ...)
                                              /go/src/github.com/docker/docker/daemon/network.go:286 +0x54
                                      github.com/docker/docker/api/server/router/network.(*networkRouter).postNetworkCreate(0x4000141580, 0x559069a4b8, 0x4000eea840, 0x559068b530, 0x4000489420, 0x4000b03400, 0x4000eea720, 0x55901c1660, 0x1)
                                              /go/src/github.com/docker/docker/api/server/router/network/network_routes.go:229 +0x2a8
                                      github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x559069a4b8, 0x4000eea840, 0x559068b530, 0x4000489420, 0x4000b03400, 0x4000eea720, 0x559069a4b8, 0x4000eea840)
                                              /go/src/github.com/docker/docker/api/server/middleware/experimental.go:26 +0x140
                                      github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1(0x559069a4b8, 0x4000eea810, 0x559068b530, 0x4000489420, 0x4000b03400, 0x4000eea720, 0x7f8439eb00, 0x40)
                                              /go/src/github.com/docker/docker/api/server/middleware/version.go:62 +0x4cc
                                      github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1(0x559069a4b8, 0x4000eea810, 0x559068b530, 0x4000489420, 0x4000b03400, 0x4000eea720, 0x559069a4b8, 0x4000eea810)
                                              /go/src/github.com/docker/docker/pkg/authorization/middleware.go:59 +0x5e4
                                      github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1(0x559068b530, 0x4000489420, 0x4000b03300)
                                              /go/src/github.com/docker/docker/api/server/server.go:141 +0x1bc
                                      net/http.HandlerFunc.ServeHTTP(0x400000f158, 0x559068b530, 0x4000489420, 0x4000b03300)
                                              /usr/local/go/src/net/http/server.go:2049 +0x40
                                      github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0x4000dc8e40, 0x559068b530, 0x4000489420, 0x4000b03100)
                                              /go/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:210 +0x9c
                                      net/http.serverHandler.ServeHTTP(0x40002782a0, 0x559068b530, 0x4000489420, 0x4000b03100)
                                              /usr/local/go/src/net/http/server.go:2867 +0xbc
                                      net/http.(*conn).serve(0x4000bff2c0, 0x559069a4b8, 0x4000814540)
                                              /usr/local/go/src/net/http/server.go:1932 +0x71c
                                      created by net/http.(*Server).Serve
                                              /usr/local/go/src/net/http/server.go:2993 +0x308
Nov 19 02:01:04  docker-compose[3379]: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?
Nov 19 02:01:04  docker-compose[3379]: If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
Nov 19 02:01:04  systemd-udevd[3393]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 19 02:01:04  NetworkManager[551]: <info>  [1637287264.6040] manager: (br-f060ca033c96): new Bridge device (/org/freedesktop/NetworkManager/Devices/34)
Nov 19 02:01:04  NetworkManager[551]: <info>  [1637287264.6199] devices added (path: /sys/devices/virtual/net/br-f060ca033c96, iface: br-f060ca033c96)
Nov 19 02:01:04  NetworkManager[551]: <info>  [1637287264.6199] device added (path: /sys/devices/virtual/net/br-f060ca033c96, iface: br-f060ca033c96): no ifupdown configuration found.
Nov 19 02:01:04  systemd[1]: docker-compose.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 02:01:04  systemd[1]: docker-compose.service: Failed with result 'exit-code'.

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally): This seems to be an ongoing issue - and we can only get things running by rebooting the machine... Trying to do a systemctl restart docker will continue to crash.

Docker is installed on arm via:

wget -qO - https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
     $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-install docker-ce docker-ce-cli containerd.io docker-compose

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.10
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        b485636
 Built:             Mon Oct 25 07:42:07 2021
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.10
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       e2f740d
  Built:            Mon Oct 25 07:40:42 2021
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.4.11
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 4
 Server Version: 20.10.10
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.4.194-g50f0a60241aa-dirty
 Operating System: Ubuntu 18.04.6 LTS
 OSType: linux
 Architecture: aarch64
 CPUs: 6
 Total Memory: 1.912GiB
 Name: faceway
 ID: 3U46:J6HX:SMDX:7QBL:URFY:56YQ:NRGD:4HY3:AY5Z:BXVR:RRL6:COKH
 Docker Root Dir: /userdata/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No kernel memory TCP limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

CRCinAU commented 2 years ago

Looking further into this, it seems like NetworkManager was taking control of the docker0 and associated interfaces causing Docker to crash.

The fix is to add the following to `/etc/NetworkManager/NetworkManager.conf:

[keyfile]
unmanaged-devices=interface-name:docker*;interface-name:br-*;interface-name:vmnet*;interface-name:vboxnet*;interface-name:veth*

While this resolves the problem with docker crashing, docker really should be able to not crash or handle this issue gracefully.

EDIT: This is incorrect diagnosis due to a number of factors in testing. See below comment for full problem.

CRCinAU commented 2 years ago

It seems that changing the NetworkManager config doesn't actually fix this issue.... The key to reproduce seems to be:

1) Start a set of containers via docker-compose 2) Stop a set of containers via docker-compose 3) Restart docker via: systemctl restart docker

From that point on, dockerd will always crash.

Running dockerd -D gives the following:

# dockerd -D
INFO[2021-11-19T06:22:05.568776286Z] Starting up                                  
DEBU[2021-11-19T06:22:05.569679285Z] Listener created for HTTP on unix (/var/run/docker.sock) 
INFO[2021-11-19T06:22:05.570408035Z] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf 
DEBU[2021-11-19T06:22:05.571308118Z] Golang's threads limit set to 13950          
INFO[2021-11-19T06:22:05.572328951Z] parsed scheme: "unix"                         module=grpc
INFO[2021-11-19T06:22:05.572403034Z] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2021-11-19T06:22:05.572507743Z] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2021-11-19T06:22:05.572560243Z] ClientConn switching balancer to "pick_first"  module=grpc
DEBU[2021-11-19T06:22:05.572404784Z] metrics API listening on /var/run/docker/metrics.sock 
INFO[2021-11-19T06:22:05.576421324Z] parsed scheme: "unix"                         module=grpc
INFO[2021-11-19T06:22:05.576517283Z] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2021-11-19T06:22:05.576597199Z] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2021-11-19T06:22:05.576639783Z] ClientConn switching balancer to "pick_first"  module=grpc
DEBU[2021-11-19T06:22:05.578702282Z] Using default logging driver journald        
DEBU[2021-11-19T06:22:05.578851907Z] [graphdriver] priority list: [btrfs zfs overlay2 fuse-overlayfs aufs overlay devicemapper vfs] 
DEBU[2021-11-19T06:22:05.579263906Z] processing event stream                       module=libcontainerd namespace=plugins.moby
DEBU[2021-11-19T06:22:05.589925319Z] backingFs=extfs, projectQuotaSupported=false, indexOff="", userxattr=""  storage-driver=overlay2
INFO[2021-11-19T06:22:05.589989777Z] [graphdriver] using prior storage driver: overlay2 
DEBU[2021-11-19T06:22:05.590016319Z] Initialized graph driver overlay2            
DEBU[2021-11-19T06:22:05.590264527Z] No quota support for local volumes in /userdata/docker/volumes: Filesystem does not support, or has not enabled quotas 
DEBU[2021-11-19T06:22:05.594526650Z] Max Concurrent Downloads: 3                  
DEBU[2021-11-19T06:22:05.594574192Z] Max Concurrent Uploads: 5                    
DEBU[2021-11-19T06:22:05.594589942Z] Max Download Attempts: 5                     
INFO[2021-11-19T06:22:05.594636900Z] Loading containers: start.                   
DEBU[2021-11-19T06:22:05.594764942Z] Option Experimental: false                   
DEBU[2021-11-19T06:22:05.594793233Z] Option DefaultDriver: bridge                 
DEBU[2021-11-19T06:22:05.594825900Z] Option DefaultNetwork: bridge                
DEBU[2021-11-19T06:22:05.594841650Z] Network Control Plane MTU: 1500              
DEBU[2021-11-19T06:22:05.595069025Z] processing event stream                       module=libcontainerd namespace=moby
DEBU[2021-11-19T06:22:05.607310270Z] /sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-ISOLATION] 
DEBU[2021-11-19T06:22:05.609321019Z] /sbin/iptables, [--wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2021-11-19T06:22:05.611598351Z] /sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER] 
DEBU[2021-11-19T06:22:05.613968142Z] /sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2021-11-19T06:22:05.616226933Z] /sbin/iptables, [--wait -t nat -D PREROUTING] 
DEBU[2021-11-19T06:22:05.618132682Z] /sbin/iptables, [--wait -t nat -D OUTPUT]    
DEBU[2021-11-19T06:22:05.620018306Z] /sbin/iptables, [--wait -t nat -F DOCKER]    
DEBU[2021-11-19T06:22:05.621818180Z] /sbin/iptables, [--wait -t nat -X DOCKER]    
DEBU[2021-11-19T06:22:05.623571971Z] /sbin/iptables, [--wait -t filter -F DOCKER] 
DEBU[2021-11-19T06:22:05.625395179Z] /sbin/iptables, [--wait -t filter -X DOCKER] 
DEBU[2021-11-19T06:22:05.627186303Z] /sbin/iptables, [--wait -t filter -F DOCKER-ISOLATION-STAGE-1] 
DEBU[2021-11-19T06:22:05.628969261Z] /sbin/iptables, [--wait -t filter -X DOCKER-ISOLATION-STAGE-1] 
DEBU[2021-11-19T06:22:05.630782260Z] /sbin/iptables, [--wait -t filter -F DOCKER-ISOLATION-STAGE-2] 
DEBU[2021-11-19T06:22:05.632570176Z] /sbin/iptables, [--wait -t filter -X DOCKER-ISOLATION-STAGE-2] 
DEBU[2021-11-19T06:22:05.634374133Z] /sbin/iptables, [--wait -t filter -F DOCKER-ISOLATION] 
DEBU[2021-11-19T06:22:05.636136674Z] /sbin/iptables, [--wait -t filter -X DOCKER-ISOLATION] 
DEBU[2021-11-19T06:22:05.637961923Z] /sbin/iptables, [--wait -t nat -n -L DOCKER] 
DEBU[2021-11-19T06:22:05.639802048Z] /sbin/iptables, [--wait -t nat -N DOCKER]    
DEBU[2021-11-19T06:22:05.641635755Z] /sbin/iptables, [--wait -t filter -n -L DOCKER] 
DEBU[2021-11-19T06:22:05.643506796Z] /sbin/iptables, [--wait -t filter -n -L DOCKER-ISOLATION-STAGE-1] 
DEBU[2021-11-19T06:22:05.645337587Z] /sbin/iptables, [--wait -t filter -n -L DOCKER-ISOLATION-STAGE-2] 
DEBU[2021-11-19T06:22:05.647138503Z] /sbin/iptables, [--wait -t filter -N DOCKER-ISOLATION-STAGE-2] 
DEBU[2021-11-19T06:22:05.649196210Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -j RETURN] 
DEBU[2021-11-19T06:22:05.651249543Z] /sbin/iptables, [--wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN] 
DEBU[2021-11-19T06:22:05.653251542Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -j RETURN] 
DEBU[2021-11-19T06:22:05.655404333Z] /sbin/iptables, [--wait -A DOCKER-ISOLATION-STAGE-2 -j RETURN] 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x55912cbd84]

goroutine 1 [running]:
github.com/docker/docker/vendor/github.com/vishvananda/netlink.parseAddr(0x4000b840b4, 0x40, 0x40, 0x0, 0x400073adf4, 0x4, 0x280, 0x0, 0x0, 0x4000b840c8, ...)
    /go/src/github.com/docker/docker/vendor/github.com/vishvananda/netlink/addr_linux.go:274 +0x174
github.com/docker/docker/vendor/github.com/vishvananda/netlink.(*Handle).AddrList(0x4000735440, 0x5592984250, 0x40007497a0, 0x2, 0x40007497a0, 0x0, 0x0, 0x4000735300, 0x1)
    /go/src/github.com/docker/docker/vendor/github.com/vishvananda/netlink/addr_linux.go:199 +0x1a0
github.com/docker/docker/vendor/github.com/docker/libnetwork/netutils.ElectInterfaceAddresses(0x5591fbfd14, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/netutils/utils_linux.go:81 +0xe4
github.com/docker/docker/daemon.initBridgeDriver(0x55929e1188, 0x4000866100, 0x40004b6b00, 0x0, 0x0)
    /go/src/github.com/docker/docker/daemon/daemon_unix.go:948 +0x2bc
github.com/docker/docker/daemon.(*Daemon).initNetworkController(0x40004f01e0, 0x40004b6b00, 0x4000a95860, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/docker/docker/daemon/daemon_unix.go:891 +0x2ec
github.com/docker/docker/daemon.(*Daemon).restore(0x40004f01e0, 0x4000170580, 0x400015c000)
    /go/src/github.com/docker/docker/daemon/daemon.go:490 +0x3d8
github.com/docker/docker/daemon.NewDaemon(0x55929a5db0, 0x4000170580, 0x40004b6b00, 0x40004a5e90, 0x0, 0x0, 0x0)
    /go/src/github.com/docker/docker/daemon/daemon.go:1150 +0x20d8
main.(*DaemonCli).start(0x40004a5260, 0x40000a4780, 0x0, 0x0)
    /go/src/github.com/docker/docker/cmd/dockerd/daemon.go:195 +0x588
main.runDaemon(...)
    /go/src/github.com/docker/docker/cmd/dockerd/docker_unix.go:13
main.newDaemonCommand.func1(0x400023eb00, 0x40003f0e80, 0x0, 0x1, 0x0, 0x0)
    /go/src/github.com/docker/docker/cmd/dockerd/docker.go:34 +0x78
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0x400023eb00, 0x40001c2010, 0x1, 0x1, 0x400023eb00, 0x40001c2010)
    /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:850 +0x320
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x400023eb00, 0x0, 0x0, 0x7)
    /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:958 +0x258
github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(...)
    /go/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:895
main.main()
    /go/src/github.com/docker/docker/cmd/dockerd/docker.go:97 +0x188
CRCinAU commented 2 years ago

This seems to be related to: https://github.com/vishvananda/netlink/issues/664

We have a 4G uplink via a mPCIe LTE card which brings up a ppp0 interface managed by NetworkManager.

We can hit this error by doing:

root@faceway:~# systemctl restart docker
(works fine)
root@faceway:~# systemctl restart docker
(works fine)
root@faceway:~# systemctl restart docker
(works fine)
root@faceway:~# nmcli c u 4G
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3)
(ppp connection is now active)
root@faceway:~# systemctl restart docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
(dockerd crashes)
thaJeztah commented 2 years ago

Thanks for reporting; also looks related / similar to https://github.com/docker/for-linux/issues/1281

CRCinAU commented 2 years ago

Thanks for reporting; also looks related / similar to docker/for-linux#1281

Yes, this does look like the same issue.

CRCinAU commented 2 years ago

@thaJeztah - Out of interest, how long does it normally take to have issues like this to go through the process and be available in an updated community package etc?

Just trying to get a grasp of process / procedures to assist in developing our plan to work around / fix the effect this issue has on our usage...

akerouanton commented 2 years ago

It looks like the issue has been fixed in vishvananda/netlink#665 but no new release have been made since then.

akerouanton commented 2 years ago

@CRCinAU I tried to set up a pptp connection on a VM to have a real ppp interface but with no luck, I can't reproduce this issue. As I'm not familiar with this type of interface, I'd probably have to set up a nlmon interface (which requires compiling the appropriate kernel module) and debug how pptpd/pppd create ppp interface to hopefully try to create my own dummy interface to reproduce the original bug.

Unfortunately, I believe these steps are required as the linked netlink PR states:

It was discovered that this does resolve a potential panic but there is other elements in the code-base that assume IFA_ADDRESS will be present. Maybe a larger fix to remove that assumption is necessary?

So, I have two questions for you:

  1. Could you provide the output of ip link show and ip addr show please?
  2. Would you be able to compile Docker by yourself to test a fix, if I provide the instructions for doing it?
CRCinAU commented 2 years ago

@akerouanton - It may be possible for me to do a test build - however we use Ubuntu 18.0.4 on an embedded arm system - which I'm not sure if that complicates things.

Ideally, if I can test it, I'd like to get it pushed via the docker.com site to at least give us the option to install updated packages from a repo instead of trying to build / package / maintain it myself...

I'll have to do a bit of hardware mangling - my test unit had the LTE modem removed due to this crash, so I'll have to install it and ensure it works again (even if it does crash docker) to be able to grab other info...

romainreignier commented 1 year ago

Thanks for your investigation @CRCinAU I have faced the same issue on a similar setup with a USB 4G modem (ARM64, Ubuntu 18.04). It seems that this PR: https://github.com/moby/moby/pull/43718 updated the netlink version with the proposed fix: https://github.com/vishvananda/netlink/pull/665 included. But the crash still occurs with docker version 20.10.18 (latest version as of today).

romainreignier commented 1 year ago

Actually, I have just figured out that the PR https://github.com/moby/moby/pull/43718 is not included in the 20.10 branch but only on master and 22.06 branches.

romainreignier commented 1 year ago

I have tried a simple app with this code:

package main

import (
    "fmt"

    "github.com/vishvananda/netlink"
)

func main() {
    fmt.Println("Version netlink: 8715fe718dfdf487a919acb6df7da109346bbfd6")
    addrs, err := netlink.AddrList(nil, netlink.FAMILY_V4)
    if err != nil {
        fmt.Println(err)
        return
    }
    for _, v := range addrs {
        fmt.Println("-", v)
    }
}

And it does not crash while the same code with the latest released version of netlink: 1.1.0, it crashes the same as Docker.

CRCinAU commented 1 year ago

Yeah - I'm a bit disappointed overall that this still doesn't seem to have been addressed within the last year - especially as its a fatal error meaning docker breaks completely.