quay / mirror-registry

A standalone registry used to mirror images for Openshift installations.
Apache License 2.0
54 stars 41 forks source link

Pods are running but registry is unresponsive at some point after installation #144

Open SalaryTheft opened 6 months ago

SalaryTheft commented 6 months ago

All the pods are running but registry server is unresponsive at some point after installation. (no response at curl https://localhost:8443)

I have to restart the pods or even have to reboot the host to get it working.

All the pods are running:

[root@bastion ~]# podman ps -a
CONTAINER ID  IMAGE                                                    COMMAND         CREATED       STATUS       PORTS                   NAMES
db266da38b9c  registry.access.redhat.com/ubi8/pause:8.7-6              infinity        13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  5e70ee01733b-infra
767d8f665354  registry.redhat.io/rhel8/redis-6:1-92.1669834635         run-redis       13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-redis
73b03983db2f  registry.redhat.io/rhel8/postgresql-10:1-203.1669834630  run-postgresql  13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-postgres
41c21e84bb3e  registry.redhat.io/quay/quay-rhel8:v3.8.14               registry        13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-app

New logs are comming up, so the containers are running fine... I guess?

[root@bastion ~]# podman logs --tail=10 -f quay-app
exportactionlogsworker stdout | 2024-03-26 00:28:00,067 [52] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" (scheduled at 2024-03-26 00:28:00.067443+00:00)
exportactionlogsworker stdout | 2024-03-26 00:28:00,071 [52] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:04,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" (scheduled at 2024-03-26 00:28:04.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:04,727 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" executed successfully
repositorygcworker stdout | 2024-03-26 00:28:11,768 [75] [INFO] [apscheduler.executors.default] Running job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" (scheduled at 2024-03-26 00:28:11.767795+00:00)
repositorygcworker stdout | 2024-03-26 00:28:11,769 [75] [INFO] [apscheduler.executors.default] Job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" executed successfully
gcworker stdout | 2024-03-26 00:28:12,861 [53] [INFO] [apscheduler.executors.default] Running job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" (scheduled at 2024-03-26 00:28:12.860612+00:00)
gcworker stdout | 2024-03-26 00:28:12,868 [53] [INFO] [apscheduler.executors.default] Job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:14,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" (scheduled at 2024-03-26 00:28:14.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:14,731 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" executed successfully

Nothing strange on the quay-app container deatails.

[root@bastion ~]# podman inspect quay-app
[
     {
          "Id": "41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389",
          "Created": "2024-03-25T07:50:17.451450987-04:00",
          "Path": "dumb-init",
          "Args": [
               "--",
               "/quay-registry/quay-entrypoint.sh",
               "registry"
          ],
          "State": {
               "OciVersion": "1.1.0-rc.3",
               "Status": "running",
               "Running": true,
               "Paused": false,
               "Restarting": false,
               "OOMKilled": false,
               "Dead": false,
               "Pid": 7577,
               "ConmonPid": 7575,
               "ExitCode": 0,
               "Error": "",
               "StartedAt": "2024-03-25T07:50:17.61683645-04:00",
               "FinishedAt": "0001-01-01T00:00:00Z",
               "Health": {
                    "Status": "",
                    "FailingStreak": 0,
                    "Log": null
               },
               "CgroupPath": "/machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice/libpod-41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389.scope",
               "CheckpointedAt": "0001-01-01T00:00:00Z",
               "RestoredAt": "0001-01-01T00:00:00Z"
          },
          "Image": "93b30dda302e3554fcfea484da1fc7b981dc4ac173b195def4ab79b86dfaf616",
          "ImageDigest": "sha256:19e0709632a860dc93e54e9d79b8da9b02334122775932eaefaccf4783524ef4",
          "ImageName": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
          "Rootfs": "",
          "Pod": "5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc",
          "ResolvConfPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/resolv.conf",
          "HostnamePath": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/hostname",
          "HostsPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/hosts",
          "StaticDir": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata",
          "OCIConfigPath": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/config.json",
          "OCIRuntime": "crun",
          "ConmonPidFile": "/run/quay-app.service-pid",
          "PidFile": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/pidfile",
          "Name": "quay-app",
          "RestartCount": 0,
          "Driver": "overlay",
          "MountLabel": "system_u:object_r:container_file_t:s0:c273,c984",
          "ProcessLabel": "system_u:system_r:container_t:s0:c273,c984",
          "AppArmorProfile": "",
          "EffectiveCaps": null,
          "BoundingCaps": [
               "CAP_CHOWN",
               "CAP_DAC_OVERRIDE",
               "CAP_FOWNER",
               "CAP_FSETID",
               "CAP_KILL",
               "CAP_NET_BIND_SERVICE",
               "CAP_SETFCAP",
               "CAP_SETGID",
               "CAP_SETPCAP",
               "CAP_SETUID",
               "CAP_SYS_CHROOT"
          ],
          "ExecIDs": [],
          "GraphDriver": {
               "Name": "overlay",
               "Data": {
                    "LowerDir": "/var/lib/containers/storage/overlay/19dbf084110759a3d249cd4ec487e83f55eca64deafc5d51d04787a3716fadb8/diff",
                    "MergedDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/merged",
                    "UpperDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/diff",
                    "WorkDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/work"
               }
          },
          "Mounts": [
               {
                    "Type": "volume",
                    "Name": "f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d",
                    "Source": "/var/lib/containers/storage/volumes/f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d/_data",
                    "Destination": "/tmp",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "volume",
                    "Name": "63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce",
                    "Source": "/var/lib/containers/storage/volumes/63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce/_data",
                    "Destination": "/var/log",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "volume",
                    "Name": "097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc",
                    "Source": "/var/lib/containers/storage/volumes/097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc/_data",
                    "Destination": "/conf/stack",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "bind",
                    "Source": "/opt/quay/config/quay-config",
                    "Destination": "/quay-registry/conf/stack",
                    "Driver": "",
                    "Mode": "",
                    "Options": [
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "bind",
                    "Source": "/opt/quay/data",
                    "Destination": "/datastorage",
                    "Driver": "",
                    "Mode": "",
                    "Options": [
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               }
          ],
          "Dependencies": [
               "db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec"
          ],
          "NetworkSettings": {
               "EndpointID": "",
               "Gateway": "10.88.0.1",
               "IPAddress": "10.88.0.2",
               "IPPrefixLen": 16,
               "IPv6Gateway": "",
               "GlobalIPv6Address": "",
               "GlobalIPv6PrefixLen": 0,
               "MacAddress": "a6:9c:af:e1:1b:a7",
               "Bridge": "",
               "SandboxID": "",
               "HairpinMode": false,
               "LinkLocalIPv6Address": "",
               "LinkLocalIPv6PrefixLen": 0,
               "Ports": {
                    "8443/tcp": [
                         {
                              "HostIp": "",
                              "HostPort": "8443"
                         }
                    ]
               },
               "SandboxKey": "/run/netns/netns-67bc251f-bac0-1817-c280-f49b54fda5bc",
               "Networks": {
                    "podman": {
                         "EndpointID": "",
                         "Gateway": "10.88.0.1",
                         "IPAddress": "10.88.0.2",
                         "IPPrefixLen": 16,
                         "IPv6Gateway": "",
                         "GlobalIPv6Address": "",
                         "GlobalIPv6PrefixLen": 0,
                         "MacAddress": "a6:9c:af:e1:1b:a7",
                         "NetworkID": "podman",
                         "DriverOpts": null,
                         "IPAMConfig": null,
                         "Links": null,
                         "Aliases": [
                              "db266da38b9c",
                              "quay-pod"
                         ]
                    }
               }
          },
          "Namespace": "",
          "IsInfra": false,
          "IsService": false,
          "KubeExitCodePropagation": "invalid",
          "lockNumber": 37,
          "Config": {
               "Hostname": "quay-pod",
               "Domainname": "",
               "User": "1001",
               "AttachStdin": false,
               "AttachStdout": false,
               "AttachStderr": false,
               "Tty": false,
               "OpenStdin": false,
               "StdinOnce": false,
               "Env": [
                    "LANG=C.UTF-8",
                    "QUAYDIR=/quay-registry",
                    "PYTHONUNBUFFERED=1",
                    "RED_HAT_QUAY=true",
                    "TERM=xterm",
                    "container=oci",
                    "PYTHONIOENCODING=UTF-8",
                    "LC_ALL=C.UTF-8",
                    "TZ=UTC",
                    "PYTHONUSERBASE=/app",
                    "QUAYPATH=/quay-registry",
                    "QUAYCONF=/quay-registry/conf",
                    "PATH=/app/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                    "QUAYRUN=/quay-registry/conf",
                    "PYTHONPATH=/quay-registry",
                    "HOME=/quay-registry",
                    "HOSTNAME=quay-pod"
               ],
               "Cmd": [
                    "registry"
               ],
               "Image": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
               "Volumes": null,
               "WorkingDir": "/quay-registry",
               "Entrypoint": "dumb-init -- /quay-registry/quay-entrypoint.sh",
               "OnBuild": null,
               "Labels": null,
               "Annotations": {
                    "io.container.manager": "libpod",
                    "io.kubernetes.cri-o.SandboxID": "db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
                    "io.podman.annotations.cid-file": "/run/quay-app.service-cid",
                    "org.opencontainers.image.stopSignal": "15"
               },
               "StopSignal": 15,
               "HealthcheckOnFailureAction": "none",
               "CreateCommand": [
                    "/usr/bin/podman",
                    "run",
                    "--name",
                    "quay-app",
                    "-v",
                    "/opt/quay/config/quay-config:/quay-registry/conf/stack:Z",
                    "-v",
                    "/opt/quay/data:/datastorage:Z",
                    "--pod=quay-pod",
                    "--conmon-pidfile",
                    "/run/quay-app.service-pid",
                    "--cidfile",
                    "/run/quay-app.service-cid",
                    "--cgroups=no-conmon",
                    "--replace",
                    "registry.redhat.io/quay/quay-rhel8:v3.8.14"
               ],
               "Umask": "0022",
               "Timeout": 0,
               "StopTimeout": 10,
               "Passwd": true,
               "sdNotifyMode": "container"
          },
          "HostConfig": {
               "Binds": [
                    "f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d:/tmp:rprivate,rw,nodev,exec,nosuid,rbind",
                    "63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce:/var/log:rprivate,rw,nodev,exec,nosuid,rbind",
                    "097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc:/conf/stack:rprivate,rw,nodev,exec,nosuid,rbind",
                    "/opt/quay/config/quay-config:/quay-registry/conf/stack:rw,rprivate,rbind",
                    "/opt/quay/data:/datastorage:rw,rprivate,rbind"
               ],
               "CgroupManager": "systemd",
               "CgroupMode": "private",
               "ContainerIDFile": "/run/quay-app.service-cid",
               "LogConfig": {
                    "Type": "journald",
                    "Config": null,
                    "Path": "",
                    "Tag": "",
                    "Size": "0B"
               },
               "NetworkMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "PortBindings": {},
               "RestartPolicy": {
                    "Name": "",
                    "MaximumRetryCount": 0
               },
               "AutoRemove": false,
               "VolumeDriver": "",
               "VolumesFrom": null,
               "CapAdd": [],
               "CapDrop": [],
               "Dns": [],
               "DnsOptions": [],
               "DnsSearch": [],
               "ExtraHosts": [],
               "GroupAdd": [],
               "IpcMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "Cgroup": "",
               "Cgroups": "default",
               "Links": null,
               "OomScoreAdj": 0,
               "PidMode": "private",
               "Privileged": false,
               "PublishAllPorts": false,
               "ReadonlyRootfs": false,
               "SecurityOpt": [],
               "Tmpfs": {},
               "UTSMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "UsernsMode": "",
               "ShmSize": 65536000,
               "Runtime": "oci",
               "ConsoleSize": [
                    0,
                    0
               ],
               "Isolation": "",
               "CpuShares": 0,
               "Memory": 0,
               "NanoCpus": 0,
               "CgroupParent": "machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice",
               "BlkioWeight": 0,
               "BlkioWeightDevice": null,
               "BlkioDeviceReadBps": null,
               "BlkioDeviceWriteBps": null,
               "BlkioDeviceReadIOps": null,
               "BlkioDeviceWriteIOps": null,
               "CpuPeriod": 0,
               "CpuQuota": 0,
               "CpuRealtimePeriod": 0,
               "CpuRealtimeRuntime": 0,
               "CpusetCpus": "",
               "CpusetMems": "",
               "Devices": [],
               "DiskQuota": 0,
               "KernelMemory": 0,
               "MemoryReservation": 0,
               "MemorySwap": 0,
               "MemorySwappiness": 0,
               "OomKillDisable": false,
               "PidsLimit": 2048,
               "Ulimits": [
                    {
                         "Name": "RLIMIT_NPROC",
                         "Soft": 4194304,
                         "Hard": 4194304
                    }
               ],
               "CpuCount": 0,
               "CpuPercent": 0,
               "IOMaximumIOps": 0,
               "IOMaximumBandwidth": 0,
               "CgroupConf": null
          }
     }
]
BadgerOps commented 5 months ago

Hey team, we just ran into this same exact issue, same symptoms as well. I thought perhaps we just had a one-off issue, but then noticed this issue, so I thought I'd add a comment. I'll get some troubleshooting logs posted here. I can connect via netcat to port 8443 and have ruled out selinux, fapolicyd, etc as potential contributors.

It just.... stops responding to http traffic.

BadgerOps commented 5 months ago

I should have captured the output, but failed to - I did notice that a curl results in something similar to the following:

 curl -vvv https://<quay-server>:8443 | head
* Rebuilt URL to: https://<quay-server>:8443/

* TCP_NODELAY set
* Connected to <quay-server> port 8443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
< hangs right here where we should get a Server hello>

We never get the server hello back, nor anything beyond that - and, as noted above the port is open and responds via nc and the logs keep on rolling by for journalctl -fu quay-app.service or podman logs -f <pod_id>