jonasdemoor commented 11 months ago

Description

We're running Nomad 1.5.3 with Podman 4.6.0 compiled from sources on Debian 11 (bullseye). I attached more details about our environment below.

This is how the state of Nomad and Podman looks like after starting the job:

$ nomad job status
ID            Type    Priority  Status   Submit Date
csi-nfs-node  system  50        running  2023-08-04T12:28:24+02:00

$ podman ps
CONTAINER ID  IMAGE                                         COMMAND               CREATED        STATUS        PORTS       NAMES
452b8b3dc9d0  registry.k8s.io/sig-storage/nfsplugin:v4.1.0  -v=5 --nodeid=nom...  6 seconds ago  Up 6 seconds              plugin-928c60c6-2c62-5e13-f10d-054ac74f16f4

$ journalctl -u podman
Aug 04 12:28:54 <redacted> podman[70819]: @ - - [04/Aug/2023:12:28:54 +0200] "GET /v1.0.0/libpod/containers/452b8b3dc9d04cfb4a41ee11f3590f46c75ce00d60fef9826340e6e577637ca3/stats?stream=false HTTP/1.1" 200 1112 "" "Go-http-client/1.1"
Aug 04 12:28:55 <redacted> podman[70819]: @ - - [04/Aug/2023:12:28:55 +0200] "GET /v1.0.0/libpod/containers/452b8b3dc9d04cfb4a41ee11f3590f46c75ce00d60fef9826340e6e577637ca3/stats?stream=false HTTP/1.1" 200 1112 "" "Go-http-client/1.1"

After stopping the container, Nomad lets Podman correctly stop the container, but doesn't gc/remove it. Even though the job is dead according to Nomad, it keeps trying to get stats for that container.

$ nomad job stop csi-nfs-node
==> 2023-08-04T12:30:15+02:00: Monitoring evaluation "21d85be1"
    2023-08-04T12:30:15+02:00: Evaluation triggered by job "csi-nfs-node"
    2023-08-04T12:30:16+02:00: Evaluation status changed: "pending" -> "complete"
==> 2023-08-04T12:30:16+02:00: Evaluation "21d85be1" finished with status "complete"
$ nomad job status           
ID            Type    Priority  Status          Submit Date
csi-nfs-node  system  50        dead (stopped)  2023-08-04T12:28:24+02:00

$ podman ps -a
CONTAINER ID  IMAGE                                         COMMAND               CREATED        STATUS                     PORTS       NAMES
452b8b3dc9d0  registry.k8s.io/sig-storage/nfsplugin:v4.1.0  -v=5 --nodeid=nom...  2 minutes ago  Exited (2) 32 seconds ago              plugin-928c60c6-2c62-5e13-f10d-054ac74f16f4

$ journalctl -u podman
Aug 04 12:30:55 <redacted> podman[70819]: time="2023-08-04T12:30:55+02:00" level=info msg="Request Failed(Internal Server Error): container is stopped"
Aug 04 12:30:55 <redacted> podman[70819]: http: superfluous response.WriteHeader call from github.com/gorilla/handlers.(*responseLogger).WriteHeader (handlers.go:61)
Aug 04 12:30:55 <redacted> podman[70819]: @ - - [04/Aug/2023:12:30:55 +0200] "GET /v1.0.0/libpod/containers/452b8b3dc9d04cfb4a41ee11f3590f46c75ce00d60fef9826340e6e577637ca3/stats?stream=false HTTP/1.1" 500 81 "" "Go-http-client/1.1"

When I try to start the Nomad job again, Nomad thinks everything is fine and the job is running, but the old Podman container is still there and in a stopped state. Normally, by default, the old container should have been removed on stopping the job and a new one should have been started by Podman when starting the job again.

$ nomad job status 
ID            Type    Priority  Status   Submit Date
csi-nfs-node  system  50        running  2023-08-04T12:32:58+02:00

$ podman ps -a
CONTAINER ID  IMAGE                                         COMMAND               CREATED        STATUS                    PORTS       NAMES
452b8b3dc9d0  registry.k8s.io/sig-storage/nfsplugin:v4.1.0  -v=5 --nodeid=nom...  5 minutes ago  Exited (2) 3 minutes ago              plugin-928c60c6-2c62-5e13-f10d-054ac74f16f4

$ journalctl -u podman
Aug 04 12:33:59 <redacted> podman[70819]: @ - - [04/Aug/2023:12:33:59 +0200] "GET /v1.0.0/libpod/containers/452b8b3dc9d04cfb4a41ee11f3590f46c75ce00d60fef9826340e6e577637ca3/stats?stream=false HTTP/1.1" 500 81 "" "Go-http-client/1.1"
Aug 04 12:34:00 <redacted> podman[70819]: time="2023-08-04T12:34:00+02:00" level=info msg="Request Failed(Internal Server Error): container is stopped"
Aug 04 12:34:00 <redacted> podman[70819]: http: superfluous response.WriteHeader call from github.com/gorilla/handlers.(*responseLogger).WriteHeader (handlers.go:61)

It takes a manual podman rm <stopped_container_id> to get Nomad back in the right state. Then, Podman starts a new container and the job is running fine.

Downgrading to Podman 4.5.1 solves the issue for us. With that version, the containers are successfully stopped + deleted, as illustrated by the journalctl output:

Aug 04 12:47:04 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:04 +0200] "POST /v1.0.0/libpod/containers/create HTTP/1.1" 201 88 "" "Go-http-client/1.1"
Aug 04 12:47:04 <redacted> podman[73176]: [INFO  netavark::firewall] Using iptables firewall driver
Aug 04 12:47:04 <redacted> podman[73176]: [INFO  netavark::network::netlink] Adding route (dest: 0.0.0.0/0 ,gw: 10.250.0.1, metric 100)
Aug 04 12:47:04 <redacted> podman[73147]: time="2023-08-04T12:47:04+02:00" level=info msg="Running conmon under slice nomad.slice and unitName libpod-conmon-3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7.scope"
Aug 04 12:47:04 <redacted> podman[73147]: time="2023-08-04T12:47:04+02:00" level=info msg="Got Conmon PID as 73220"
Aug 04 12:47:04 <redacted> podman[73147]: 2023-08-04 12:47:04.940996242 +0200 CEST m=+2.194329935 container init 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 (image=registry.k8s.io/sig-storage/nfsplugin:v4.1.0, name=plugin-12fa1a0f-94d1-aca5-111b-a5cdc99ec49d, revision=v4.1.0)
Aug 04 12:47:04 <redacted> podman[73147]: 2023-08-04 12:47:04.945554312 +0200 CEST m=+2.198888005 container start 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 (image=registry.k8s.io/sig-storage/nfsplugin:v4.1.0, name=plugin-12fa1a0f-94d1-aca5-111b-a5cdc99ec49d, revision=v4.1.0)
Aug 04 12:47:04 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:04 +0200] "POST /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/start HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Aug 04 12:47:04 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:04 +0200] "POST /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/wait?condition=running&condition=exited HTTP/1.1" 200 3 "" "Go-http-client/1.1"
Aug 04 12:47:04 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:04 +0200] "GET /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/json HTTP/1.1" 200 9418 "" "Go-http-client/1.1"
Aug 04 12:47:04 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:04 +0200] "GET /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/stats?stream=false HTTP/1.1" 200 1109 "" "Go-http-client/1.1"
(...)
Aug 04 12:47:13 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:13 +0200] "GET /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/stats?stream=false HTTP/1.1" 200 1096 "" "Go-http-client/1.1"
Aug 04 12:47:14 <redacted> podman[73147]: 2023-08-04 12:47:14.188771585 +0200 CEST m=+11.442105289 container died 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 (image=registry.k8s.io/sig-storage/nfsplugin:v4.1.0, name=plugin-12fa1a0f-94d1-aca5-111b-a5cdc99ec49d, revision=v4.1.0)
Aug 04 12:47:14 <redacted> podman[73240]: [INFO  netavark::firewall] Using iptables firewall driver
Aug 04 12:47:14 <redacted> podman[73240]: [INFO  netavark::network::bridge] removing bridge podman0
Aug 04 12:47:14 <redacted> podman[73147]: 2023-08-04 12:47:14.316331562 +0200 CEST m=+11.569665261 container cleanup 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 (image=registry.k8s.io/sig-storage/nfsplugin:v4.1.0, name=plugin-12fa1a0f-94d1-aca5-111b-a5cdc99ec49d, revision=v4.1.0)
Aug 04 12:47:14 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:14 +0200] "POST /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/stop?timeout=5&ignore=true HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Aug 04 12:47:14 <redacted> podman[73147]: time="2023-08-04T12:47:14+02:00" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 is running: container is stopped"
Aug 04 12:47:14 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:14 +0200] "GET /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/stats?stream=false HTTP/1.1" 200 0 "" "Go-http-client/1.1"
Aug 04 12:47:14 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:14 +0200] "GET /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7/json HTTP/1.1" 200 9054 "" "Go-http-client/1.1"
Aug 04 12:47:15 <redacted> podman[73147]: 2023-08-04 12:47:15.963654811 +0200 CEST m=+13.216988506 container remove 3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7 (image=registry.k8s.io/sig-storage/nfsplugin:v4.1.0, name=plugin-12fa1a0f-94d1-aca5-111b-a5cdc99ec49d, revision=v4.1.0)
Aug 04 12:47:15 <redacted> podman[73147]: @ - - [04/Aug/2023:12:47:15 +0200] "DELETE /v1.0.0/libpod/containers/3fc62d4459a09a46b11fd21c6555fde6347fa691d23a55390e4487c3db40b4e7?force=true&v=true HTTP/1.1" 200 76 "" "Go-http-client/1.1"

After some digging, I found that the stats API changed a bit between Podman 4.5.1 and 4.6.0. It used to return HTTP 200 with an empty response for stopped containers, but now it returns HTTP 200 with an error message:

$ curl -v --unix-socket /var/run/podman/podman.sock http://localhost/v1.0.0/libpod/containers/902c4d2a9b050ad614f5e2d5651d42e37ca5d0ca8c2914e1877279dc2dd542e9/stats?stream=false
*   Trying /var/run/podman/podman.sock:0...
* Connected to localhost (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/902c4d2a9b050ad614f5e2d5651d42e37ca5d0ca8c2914e1877279dc2dd542e9/stats?stream=false HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Libpod-Api-Version: 4.6.0
< Server: Libpod/4.6.0 (linux)
< X-Reference-Id: 0xc0000cd190
< Date: Thu, 03 Aug 2023 10:22:32 GMT
< Transfer-Encoding: chunked
< 
{"cause":"container is stopped","message":"container is stopped","response":500}

I think the upstream Podman PR that introduced those changes is this one: https://github.com/containers/podman/pull/15867 What I think is happening, is that the Nomad driver thinks the container is running fine, since it doesn't handle that new stats API response, it only takes the old API behaviour into account: https://github.com/hashicorp/nomad-driver-podman/blob/main/api/container_stats.go#L46

I tried poking around in the api/container_stats.go code to try add a new case for this new Podman behaviour, but my Golang skills are unfortunately not very good. I however want to assist in anyway I can, feel free to let me know if I can provide additional information and/or testing.

Environment

Debian 11 (bullseye)

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Nomad version 1.5.3

$ nomad version
Nomad v1.5.3
BuildDate 2023-04-04T20:09:50Z
Revision 434f7a1745c6304d607562daa9a4a635def7153f

Nomad Podman driver version 0.5.0

$ journalctl -u nomad | grep podman
Aug 04 11:04:00 <redacted> nomad[37870]:     2023-08-04T11:03:50.281+0200 [INFO]  agent.plugin_loader: plugin process exited: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/nomad-driver-podman pid=37886
Aug 04 11:04:00 <redacted> nomad[37870]:     2023-08-04T11:03:50.314+0200 [INFO]  agent.plugin_loader: plugin process exited: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/nomad-driver-podman pid=37895
Aug 04 11:04:00 <redacted> nomad[37870]:     2023-08-04T11:03:50.314+0200 [INFO]  agent: detected plugin: name=podman type=driver plugin_version=0.5.0

Podman system info

# podman system info
host:
arch: amd64
buildahVersion: 1.31.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: ugent-conmon_2.1.7-7_amd64
path: /usr/libexec/podman/conmon
version: 'conmon version 2.1.7, commit: unknown'
cpuUtilization:
idlePercent: 97.82
systemPercent: 0.51
userPercent: 1.67
cpus: 4
databaseBackend: boltdb
distribution:
codename: bullseye
distribution: debian
version: "11"
eventLogger: journald
freeLocks: 2048
hostname: <redacted>
idMappings:
gidmap: null
uidmap: null
kernel: 5.10.0-22-amd64
linkmode: dynamic
logDriver: journald
memFree: 5135073280
memTotal: 8330928128
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
  package: Unknown
package: ugent-netavark_1.7.0-7_amd64
path: /usr/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: ugent-crun_1.8.6-7_amd64
path: /usr/bin/crun
version: |-
  crun version 1.8.6
  commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
  rundir: /run/crun
  spec: 1.0.0
  +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
exists: true
path: /run/podman/podman.sock
security:
apparmorEnabled: true
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: ""
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 536866816
swapTotal: 536866816
uptime: 2h 4m 19.00s (Approximately 0.08 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries: {}
store:
configFile: /usr/share/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/lib/containers/storage
graphRootAllocated: 3059744768
graphRootUsed: 819212288
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 1
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.6.0
Built: 1691065415
BuiltTime: Thu Aug  3 14:23:35 2023
GitCommit: ""
GoVersion: go1.20.6
Os: linux
OsArch: linux/amd64
Version: 4.6.0

inflatador commented 11 months ago

Thanks for the bug report! I'm running in to the same issue on Fedora 38. I upgraded to podman 4.6.0-1.fc38.x86_64 on August 1st and that's when I started seeing the issue.

sushain97 commented 11 months ago

I also ran into this! Thanks for taking the time to investigate and dropping a clear explanation of the issue @jonasdemoor. I've got a small fix that works for me. PR incoming.

Procsiab commented 11 months ago

I can report that I am experiencing the same problem in the following environment:

Fedora IoT 38.20230803.0
Podman 4.6.0-1
Nomad 1.6.1
Nomad Driver Podmna 0.5.0

Moreover, on a Job version update, the previous allocation's container(s) cannot be stopped, and the new allocation stays pending forever

For this specific instance, this is what I see from the not-stopping allocation's status log:

Aug 07, '23 10:15:37 +0000 | Killing | Sent interrupt. Waiting 5s before force killing
Aug 07, '23 10:13:13 +0000 | Started | Task started by client

Procsiab commented 11 months ago

I compiled the nomad-driver-podman plugin from the latest commit of @sushain97 PR's branch and I can report that it fixes the issue I described in my latest message.

hashicorp / nomad-driver-podman

Jobs are not correctly restarted on Podman 4.6.0 #277

Description

Environment