moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.66k stars 18.65k forks source link

removed/inserted USB devices do not work stay in restart loop and are removed #46608

Open aldas opened 1 year ago

aldas commented 1 year ago

Description

I was directed here by https://github.com/docker/compose/issues/11075


I have a docker compose file that runs service that read USB serial device (cheap usb to serial GPS on a sailing boat). When that device "/dev/ttyUSB0" is unplugged the container will be removed and will no recover when that USB replugged and /dev/ttyUSB0 reappears.

It would be essential that this container will not be removed from running list and stays in restart loop - thus enabling recovery/0-maintenance after reboots/disconnects

compose file is this:

version: "3.7"
services:
 backend_nmea0183serialscrape:
   # unnecessary things omitted
   restart: always
   devices:
     - /dev/ttyUSB0:/dev/ttyUSB0:rw
   device_cgroup_rules:
     # https://stackoverflow.com/a/62758958/2514290
     - "c 188:* rmw"
     - "c 166:* rmw"
     - "c 4:* rmw"
   group_add:
     - dialout
   volumes:
     - /run/udev:/run/udev:ro

Reproduce

  1. run container
  2. remove USB device
  3. expect it keep restarting in loop
  4. insert USB device
  5. expect it to recover running

Expected behavior

No response

docker version

Client: Docker Engine - Community
Version:           24.0.2
API version:       1.43
Go version:        go1.20.4
Git commit:        cb74dfc
Built:             Thu May 25 21:51:00 2023
OS/Arch:           linux/amd64
Context:           default

Server: Docker Engine - Community
Engine:
 Version:          24.0.2
 API version:      1.43 (minimum version 1.12)
 Go version:       go1.20.4
 Git commit:       659604f
 Built:            Thu May 25 21:51:00 2023
 OS/Arch:          linux/amd64
 Experimental:     false
containerd:
 Version:          1.6.21
 GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
 Version:          1.1.7
 GitCommit:        v1.1.7-0-g860f061
docker-init:
 Version:          0.19.0
 GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
Version:    24.0.2
Context:    default
Debug Mode: false

Server:
Containers: 11
 Running: 11
 Paused: 0
 Stopped: 0
Images: 16
Server Version: 24.0.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Using metacopy: false
 Native Overlay Diff: true
 userxattr: false
Logging Driver: journald
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
 apparmor
 seccomp
  Profile: builtin
 cgroupns
Kernel Version: 5.15.0-78-generic
Operating System: Ubuntu 22.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.604GiB
Name: 90231
ID: 331cee29-e1f8-49e0-9021-b0ba782c3214
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional Info

No response

thaJeztah commented 1 year ago

When that device "/dev/ttyUSB0" is unplugged the container will be removed

Do you have any daemon logs from when that happens? I'm mostly curious about the part where you mention that the container is removed (I'd expect the container to still be there, but possibly as "failing to start")

aldas commented 1 year ago

I am probably using wrong terminology here. by "removed" I mean it does not show up in "docker ps" list as it fails to start at all (does not get even state where entry-point is executed)

aldas commented 1 year ago

This is example with some logs

  1. Insert USB device to my PC. This is cheap USB to UART device /dev/ttyUSB0

    [55270.336258] usb 1-14: new full-speed USB device number 6 using xhci_hcd
    [55270.519244] usb 1-14: New USB device found, idVendor=10c4, idProduct=ea60, bcdDevice= 1.00
    [55270.519260] usb 1-14: New USB device strings: Mfr=1, Product=2, SerialNumber=3
    [55270.519268] usb 1-14: Product: CP2102 USB to UART Bridge Controller
    [55270.519275] usb 1-14: Manufacturer: Silicon Labs
    [55270.519281] usb 1-14: SerialNumber: 0001
    [55270.566086] usbcore: registered new interface driver usbserial_generic
    [55270.566127] usbserial: USB Serial support registered for generic
    [55270.571010] usbcore: registered new interface driver cp210x
    [55270.571053] usbserial: USB Serial support registered for cp210x
    [55270.571149] cp210x 1-14:1.0: cp210x converter detected
    [55270.573515] usb 1-14: cp210x converter now attached to ttyUSB0
  2. Create docker compose file with USB device mapped to that container. In this small example to simulate restarting the container I use ping to run 10seconds and then exit thus forcing container to restart

    version: "3.7"
    services:
    usb_reader:
    image: alpine:latest
    command: ['ping', '-c', '10', 'google.com']
    restart: always
    devices:
      - /dev/ttyUSB0:/dev/ttyUSB0:rw
    device_cgroup_rules:
      # https://stackoverflow.com/a/62758958/2514290
      - "c 188:* rmw"
      - "c 166:* rmw"
      - "c 4:* rmw"
    group_add:
      - dialout
    volumes:
      - /run/udev:/run/udev:ro
  3. start container sudo docker-compose up -d

  4. check with sudo docker stats that container runs, exits and gets restarted.

  5. Unplug USB device from PC

  6. Container is removed from list and does not seems to be "restarted"

journald has these lines as last lines

okt   19 22:55:35 aldas 8331c0336ed3[1947]: PING google.com (216.58.209.206): 56 data bytes
okt   19 22:55:35 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=0 ttl=56 time=4.782 ms
okt   19 22:55:36 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=1 ttl=56 time=4.599 ms
okt   19 22:55:37 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=2 ttl=56 time=4.466 ms
okt   19 22:55:38 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=3 ttl=56 time=4.705 ms
okt   19 22:55:39 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=4 ttl=56 time=4.515 ms
okt   19 22:55:40 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=5 ttl=56 time=4.181 ms
okt   19 22:55:41 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=6 ttl=56 time=4.732 ms
okt   19 22:55:42 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=7 ttl=56 time=4.386 ms
okt   19 22:55:43 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=8 ttl=56 time=4.804 ms
okt   19 22:55:44 aldas 8331c0336ed3[1947]: 64 bytes from 216.58.209.206: seq=9 ttl=56 time=4.593 ms
okt   19 22:55:44 aldas 8331c0336ed3[1947]: 
okt   19 22:55:44 aldas 8331c0336ed3[1947]: --- google.com ping statistics ---
okt   19 22:55:44 aldas 8331c0336ed3[1947]: 10 packets transmitted, 10 packets received, 0% packet loss
okt   19 22:55:44 aldas 8331c0336ed3[1947]: round-trip min/avg/max = 4.181/4.576/4.804 ms
okt   19 22:55:45 aldas dockerd[1947]: time="2023-10-19T22:55:45.001507083+03:00" level=info msg="ignoring event" container=8331c0336ed3ddc8c0d048cf14c81dc619534c74bc59ec6c02bfd1a8a9a79050 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
okt   19 22:56:45 aldas dockerd[1947]: time="2023-10-19T22:56:45.076426861+03:00" level=error msg="8331c0336ed3ddc8c0d048cf14c81dc619534c74bc59ec6c02bfd1a8a9a79050 cleanup: failed to delete container from containerd: container \"8331c0336ed3ddc8c0d048cf14c81dc619534c74bc59ec6c02bfd1a8a9a79050\" in namespace \"moby\": not found"
okt   19 22:56:45 aldas dockerd[1947]: time="2023-10-19T22:56:45.196103564+03:00" level=error msg="restartmanger wait error: error gathering device information while adding custom device \"/dev/ttyUSB0\": no such file or directory"

okt 19 22:56:45 aldas dockerd[1947]: time="2023-10-19T22:56:45.196103564+03:00" level=error msg="restartmanger wait error: error gathering device information while adding custom device \"/dev/ttyUSB0\": no such file or directory"

Maybe there is a built in workaround for this situation. It seems little bit silly to build "watchdog" process or CRON job to check for this situation and then trigger that container to be run again. It would be nice if engine could keep trying to recover from this situation by trying to start that container - until that device reappears

aldas commented 1 year ago

I found this https://github.com/moby/moby/issues/35359 but suggestions like mounting whole /dev/bus/usb are no particularly safe as I would like to expose this device and not other things to that container.

aldas commented 1 year ago

@thaJeztah is current behavior working as intended?