hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.95k stars 1.96k forks source link

Nomad stops streaming Docker stderr/stdout logs to their respective files #7619

Open dpn opened 4 years ago

dpn commented 4 years ago

Nomad version

Found on 0.9.x series of releases. Currently unknown if reproduces on 0.10.x releases- we're currently attempting determine if we see reproduction in our 0.10.x test cluster.

/ # nomad version
Nomad v0.9.7 (0e0eb07c53f99f54bcdb2e69aa8a9690a0597e7a)

Operating system and Environment details

Issue

What we're seeing is that after running for a Docker task for a while (hours to days), Nomad stops streaming Docker stdout/stderr to their respective log files. This appears to be a Nomad issue as running docker logs against the effected containers properly streams logs, so it seems that the log collection is failing somehow. This doesn't appear to be related to log rotation as this has been seen where the $task.std{out,err}.0 log file is still under the size limit for rotation.

This was originally reported by one of our teams a few months back, it seemed at the time to only effect one of their jobs and we had an alternative method of shipping their logs so we didn't dig too hard on it. We've now become aware of other jobs being effected by this so we're starting to take a closer look.

We're still in the process of doing investigation but wanted to at least get the issue submitted and we'll provide more information as we learn more.

Reproduction steps

  1. Submit a Nomad job that writes to stdout/stderr
  2. Let it run for a bit (order of magnitude is hours to days)
  3. Eventually Nomad will stop writing Docker stdout/stderr to $task.std{out,err}.X log files

Job file (if appropriate)

{
    "Job": {
        "Datacenters": [
            "a-dc"
        ],
        "ID": "a-nomad-job",
        "Migrate": null,
        "Name": "job-name",
        "Namespace": "default",
        "ParameterizedJob": null,
        "ParentID": "",
        "Payload": null,
        "Periodic": null,
        "Priority": 50,
        "Region": "a-region",
        "Reschedule": null,
        "Spreads": [
            {
                "Attribute": "${node.unique.id}",
                "SpreadTarget": null,
                "Weight": 50
            }
        ],
        "Stable": true,
        "Status": "running",
        "StatusDescription": "",
        "Stop": false,
        "TaskGroups": [
            {
                "Affinities": null,
                "Constraints": [
                    {
                        "LTarget": "${attr.vault.version}",
                        "Operand": "version",
                        "RTarget": ">= 0.6.1"
                    },
                    {
                        "LTarget": "${attr.os.signals}",
                        "Operand": "set_contains",
                        "RTarget": "SIGINT"
                    }
                ],
                "Count": 1,
                "EphemeralDisk": {
                    "Migrate": false,
                    "SizeMB": 300,
                    "Sticky": false
                },
                "Meta": null,
                "Migrate": {
                    "HealthCheck": "checks",
                    "HealthyDeadline": 300000000000,
                    "MaxParallel": 1,
                    "MinHealthyTime": 10000000000
                },
                "Name": "job-name",
                "ReschedulePolicy": {
                    "Attempts": 0,
                    "Delay": 30000000000,
                    "DelayFunction": "exponential",
                    "Interval": 0,
                    "MaxDelay": 3600000000000,
                    "Unlimited": true
                },
                "RestartPolicy": {
                    "Attempts": 2,
                    "Delay": 15000000000,
                    "Interval": 1800000000000,
                    "Mode": "fail"
                },
                "Spreads": null,
                "Tasks": [
                    {
                        "Affinities": null,
                        "Artifacts": null,
                        "Config": {
                            "volumes": [
                                "/var/lib/nomad/alloc/${NOMAD_ALLOC_ID}/${NOMAD_ALLOC_DIR}/logs:/opt/docker/logs"
                            ],
                            "extra_hosts": [
                                "telegraf:${attr.unique.network.ip-address}"
                            ],
                            "force_pull": true,
                            "image": "a-docker-image:latest",
                        },
                        "Constraints": null,
                        "DispatchPayload": null,
                        "Driver": "docker",
                        "Env": null,
                        "KillSignal": "",
                        "KillTimeout": 5000000000,
                        "Leader": false,
                        "LogConfig": {
                            "MaxFileSizeMB": 10,
                            "MaxFiles": 10
                        },
                        "Meta": null,
                        "Name": "a-nomad-job-main",
                        "Resources": {
                            "CPU": 240,
                            "Devices": null,
                            "DiskMB": 0,
                            "IOPS": 0,
                            "MemoryMB": 128,
                            "Networks": [
                                {
                                    "CIDR": "",
                                    "Device": "",
                                    "DynamicPorts": [
                                        {
                                            "Label": "http",
                                            "Value": 0
                                        }
                                    ],
                                    "IP": "",
                                    "MBits": 10,
                                    "ReservedPorts": null
                                }
                            ]
                        },
                        "Services": [
                            {
                                "AddressMode": "auto",
                                "CanaryTags": null,
                                "CheckRestart": null,
                                "Checks": null,
                                "Id": "",
                                "Name": "a-service",
                                "PortLabel": "http",
                                "Tags": null
                            }
                        ],
                        "ShutdownDelay": 0,
                        "Templates": [
                            {
                                "ChangeMode": "restart",
                                "ChangeSignal": "SIGHUP",
                                "DestPath": "/secrets/shared_secrets.env",
                                "EmbeddedTmpl": "REDACTED",
                                "Envvars": true,
                                "LeftDelim": "{{",
                                "Perms": "0600",
                                "RightDelim": "}}",
                                "SourcePath": "",
                                "Splay": 5000000000,
                                "VaultGrace": 15000000000
                            },
                            {
                                "ChangeMode": "restart",
                                "ChangeSignal": "SIGHUP",
                                "DestPath": "/local/enc_environment.env",
                                "EmbeddedTmpl": "REDACTED",
                                "Envvars": true,
                                "LeftDelim": "{{",
                                "Perms": "0644",
                                "RightDelim": "}}",
                                "SourcePath": "",
                                "Splay": 5000000000,
                                "VaultGrace": 15000000000
                            }
                        ],
                        "User": "",
                        "Vault": {
                            "ChangeMode": "restart",
                            "ChangeSignal": "SIGHUP",
                            "Env": false,
                            "Policies": [
                                "REDACTED",
                                "REDACTED",
                                "REDACTED"
                            ]
                        }
                    }
                ],
                "Update": {
                    "AutoPromote": false,
                    "AutoRevert": true,
                    "Canary": 0,
                    "HealthCheck": "checks",
                    "HealthyDeadline": 120000000000,
                    "MaxParallel": 1,
                    "MinHealthyTime": 30000000000,
                    "ProgressDeadline": 600000000000,
                    "Stagger": 30000000000
                }
            }
        ],
        "Type": "service",
        "Update": {
            "AutoPromote": false,
            "AutoRevert": false,
            "Canary": 0,
            "HealthCheck": "",
            "HealthyDeadline": 0,
            "MaxParallel": 1,
            "MinHealthyTime": 0,
            "ProgressDeadline": 0,
            "Stagger": 30000000000
        },
        "VaultToken": "",
        "Version": 29
    }
}

Nomad Client logs (if appropriate)

Will grab these when we see it reproduce again or I'm able to discover more instances currently hidden in the fleet.

tgross commented 4 years ago

Hi @dpn! When this happens are both the log shipping processes still running or have they crashed? This will be nomad logmon and nomad docker_logger for each running job.

dpn commented 4 years ago

Thanks for the response @tgross. I'm trying to find the time to throw some scripts together to track down containers in this state to see if this is reproducing on 0.10.x in our setup and once that's complete I can leverage those to answer that question. Will follow up with what I find.

kneufeld commented 4 years ago

I seem to be seeing this behaviour as well. I'm not sure how to correlate logmon and docker_logger to the respective containers but I have a whole heap of them running. And this is happening on a freshly started container.

Docker 19.03.5 Nomad v0.11.2

ifokeev commented 4 years ago

same problem here Docker v19.03.12 Nomad v0.12.0

dansteen commented 3 years ago

Same problem here: Docker v20.10.2 Nomad v0.12.7

I find that it only seems to be an issue for jobs that send a whole lot of logs very rapidly. In my case there are 16 docker containers running, 17 instances of nomad docker logmon, and 15 instances of nomad docker_logger

dansteen commented 3 years ago

As an additional note. this seems to have been resolved in nomad v1.0.2. I upgraded, and am no longer seeing this.

dansteen commented 3 years ago

Update to my previous comment. It seems that this issue is still present in 1.0.2. It's just a whole lot harder to trigger. I was, however, able to trigger it by running testing that generated many many log lines. I do still see logs streaming from docker logs so it definitely seems like a nomad thing.

dansteen commented 3 years ago

A few more pieces of information here:

  1. These are docker containers, and I'm still able to see logs via the docker logs command so logs are definitely being produced, just not being handled by nomad.
  2. For the allocations in question, I am seeing 2 instances of logmon and 2 instances of docker_logger. Is that normal? One for stdout and one for stderr?
  3. I notice that there are 31 instances of logmon, 29 instances of docker_logger, and only 12 containers. So even if we need two of each for each job, the numbers don't work out.
  4. restarting the allocation restarts the docker_logger process, but does not restart the logmon process.
  5. restarting the allocation does result in log shipping starting up again.

The last two points in the above points to the docker_logger process is the source of the issue.

tgross commented 3 years ago

Thanks for the added info @dansteen

For Docker tasks, there should be exactly one logmon and one docker_logger per task. For non-Docker tasks, just the logmon. So having extraneous processes are a pretty clear signal. One of the things that might help narrow things down is if you can determine whether the extra logmon/docker_logger processes are for running allocations vs ones that have already stopped. (From their process tree.)

Another thing that can help narrow the investigation: has the Nomad agent been restarted since the allocations with extra loggers were originally created?

PaGrom commented 3 years ago

Have the same issue: Docker v19.03.5 Nomad v1.0.2 OS Windows v10.0.17763 Build 17763

dansteen commented 3 years ago

Hi @tgross, thanks for the response. It looks like the extra processes are for allocations that have already stopped. The nomad agent was not restarted prior to the extra loggers being created. (however it has since been).

dansteen commented 3 years ago

hi @tgross I now have an application that stopped logging, and there are no extra logmon or docker_logger processes on the box. So it seems like it's not related to that. Also, the nomad client has not been restarted since the box was created so it's also not going to be related to that.

dansteen commented 3 years ago

@tgross Sorry for the message spam. One bit of information that I think may be important here. By default docker (on debian 10) is configured to only allow 2 MB log files. In our case, we have an application that can log 1MB of data in a single line. When this happens that docker log file will be rotated every two messages. I can definitely see the nomad logger not following along for an event like that. I've upped the log file size in docker as a test, and will report back the results.

Update: As a note, it turns out that setting the default options for docker is not sufficient since nomad overrides those per this: https://www.nomadproject.io/docs/drivers/docker#logging I needed to set the settings in the job task as well.

Biztactix-Ryan commented 3 years ago

We are seeing this as well on Nomad Version 1.1.5 We're also running docker on Windows

It works intermittently, Some servers seem to show logs, others don't I haven't been able to narrow it to a specific server config or anything

We have 5 Servers, 4 of which support Docker and 14 Jobs running upto 4 allocations per job, So there's 43 allocations total, So keeping track of what is and isn't working is a little bit of hassle.

In the meantime we've updated 80% of our apps to just log to an external service LogTail. Which is easy and reliable... But it would be nice to have a look at them on the server that caused them.

Nomad Client Details os.name | Microsoft Windows Server 2019 Standard os.signals | SIGABRT,SIGBUS,SIGFPE,SIGHUP,SIGKILL,SIGQUIT,SIGSEGV,SIGALRM,SIGILL,SIGINT,SIGPIPE,SIGTERM,SIGTRAP os.version | 10.0.17763 Build 17763 cpu.arch | amd64 cpu.numcores | 12 cpu.frequency | 2893 cpu.modelname | Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz cpu.totalcompute | 34716 kernel.name | windows kernel.version | 10.0.17763 Build 17763 driver.docker driver.docker.os_type | windows driver.docker.version | 20.10.7 driver.docker.runtimes |   nomad.version | 1.1.5 nomad.revision | 117a23d2cdf26a1837b21c84f30c8c0f3441e927

khaledabdelaziz commented 2 years ago

We're getting same issue on Nomad 1.2.5. Logs stop randomly getting written into Nomad logs while they are still getting produced by docker containers.

Nomad Client Details:

ID              = xxxxxxxxxxxx
Name            = ip-12-34-56-78
Class           = linux
DC              = uat-eu-west-2
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = aws-ebs0
Uptime          = 722h50m30s

Drivers
Driver    Detected  Healthy  Message   Time
docker    true      true     Healthy   2022-05-09T09:14:54Z
java      false     false    <none>    2022-05-09T09:14:54Z
qemu      false     false    <none>    2022-05-09T09:14:54Z
raw_exec  false     false    disabled  2022-05-09T09:14:54Z

Node Events
Time                  Subsystem  Message          Details
2022-05-09T09:14:54Z  Cluster    Node registered  <none>

Allocated Resources
CPU             Memory         Disk
3700/24800 MHz  30 GiB/30 GiB  2.3 GiB/45 GiB

Allocation Resource Utilization
CPU            Memory
349/24800 MHz  5.7 GiB/30 GiB

Host Resource Utilization
CPU             Memory          Disk
1557/24800 MHz  7.3 GiB/30 GiB  (/dev/mapper/root_vg-root)

Allocations
ID                                    Eval ID                               Node ID                               Task Group                                    Version  Desired  Status    Created               Modified
...........

Attributes
consul.connect                           = false
consul.datacenter                        = uat-eu-west-2
consul.ft.namespaces                     = false
consul.grpc                              = -1
consul.revision                          = 37c7d06b
consul.server                            = false
consul.sku                               = oss
consul.version                           = 1.11.2
cpu.arch                                 = amd64
cpu.frequency                            = 3100
cpu.modelname                            = Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
cpu.numcores                             = 8
cpu.totalcompute                         = 24800
driver.docker                            = 1
driver.docker.bridge_ip                  = 172.17.0.1
driver.docker.os_type                    = linux
driver.docker.privileged.enabled         = true
driver.docker.runtimes                   = docker-runc,runc
driver.docker.version                    = 1.13.1
kernel.name                              = linux
kernel.version                           = 3.10.0-1160.59.1.el7.x86_64
memory.totalbytes                        = 32727478272
nomad.advertise.address                  = 12.34.56.78:4646
nomad.revision                           = 06d912a20ba029c7f4d6b371cd07594cba3ae3cd
nomad.version                            = 1.2.5
os.name                                  = redhat
os.signals                               = SIGUSR2,SIGKILL,SIGPIPE,SIGTERM,SIGURG,SIGIO,SIGQUIT,SIGBUS,SIGHUP,SIGINT,SIGTRAP,SIGTTIN,SIGWINCH,SIGABRT,SIGALRM,SIGXCPU,SIGCHLD,SIGFPE,SIGTSTP,SIGUSR1,SIGSEGV,SIGSYS,SIGTTOU,SIGCONT,SIGILL,SIGSTOP,SIGXFSZ,SIGIOT,SIGPROF
os.version                               = 7.9
platform.aws.ami-id                      = ami-0a8dbe56d340ad2d8
platform.aws.instance-type               = m5.2xlarge
platform.aws.placement.availability-zone = eu-west-2a
unique.cgroup.mountpoint                 = /sys/fs/cgroup/systemd
unique.consul.name                       = ip-12-34-56-78
unique.hostname                          = ip-12-34-56-78.domain.name
unique.network.ip-address                = 12.34.56.78
unique.platform.aws.hostname             = ip-12-34-56-78.domain.name
unique.platform.aws.instance-id          = i-00255359dab3c21b9
unique.platform.aws.local-hostname       = ip-12-34-56-78.domain.name
unique.platform.aws.local-ipv4           = 12.34.56.78
unique.platform.aws.mac                  = 06:fd:b8:ce:75:e0
unique.storage.bytesfree                 = 48213946368
unique.storage.bytestotal                = 53135540224
unique.storage.volume                    = /dev/mapper/root_vg-root
vault.accessible                         = true
vault.cluster_id                         = 96b36453-b950-2a86-3138-c2a66c408e30
vault.cluster_name                       = uat
vault.version                            = 1.6.3

Meta
connect.gateway_image     = envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level         = info
connect.proxy_concurrency = 1
connect.sidecar_image     = envoyproxy/envoy:v${NOMAD_envoy_version}
flipyap commented 1 year ago

My team and I have been experiencing this issue at random and for us it is production impacting, because we utilize a homegrown logging sidecar that reads logs from the nomad stdout/stderr files.

I've been slowly gathering information along the way, but any advice or pointers on where to go next would be appreciated.

First, a little architectural overview. I've done my best to convey a small 1000 ft view of how the specific nomad pieces work together.

2 Different, not so great, attempts to visualize

Nomad and Docker Logging





Nomad and Docker Logging - Page 2

Lower Level Details

When this phenomenon occurs, there are a few interesting things that I notice...

Futex Waits Timed Out

The first interesting tidbit of information I notice is that there are futex waits that are failing every 60 seconds. Tracking these futexs down by thread seems to point back to them being owned by golangs system monitor "sysmon"

# strace -ttt -p 96113 -s655535 -f -y
strace: Process 96113 attached with 10 threads
[pid 96126] 1685460402.610901 futex(0xc000095548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96125] 1685460402.610963 futex(0xc00030e548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96124] 1685460402.610982 futex(0xc00030e148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96123] 1685460402.611000 futex(0x48ebfa0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96122] 1685460402.611018 futex(0x48ebe58, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96121] 1685460402.611036 futex(0xc000100148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96120] 1685460402.611053 futex(0xc000094d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96119] 1685460402.611072 epoll_pwait(4<anon_inode:[eventpoll]>,  <unfinished ...>
[pid 96118] 1685460402.611880 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 96113] 1685460402.611907 futex(0x48b3fa8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96119] 1685460402.611930 <... epoll_pwait resumed>[], 128, 0, NULL, 7085560) = 0
[pid 96119] 1685460402.612007 epoll_pwait(4<anon_inode:[eventpoll]>,  <unfinished ...>
[pid 96118] 1685460422.053096 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460422.054926 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460422.065794 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460482.067844 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460482.078684 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460542.080197 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460542.090805 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460602.093027 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460602.103931 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}

That thread looks like this:

(dlv) thread 96118
Switched from 96113 to 96118
(dlv) stack -full
0  0x0000000000471703 in runtime.futex
   at runtime/sys_linux_amd64.s:560

1  0x0000000000436f2f in runtime.futexsleep
   at runtime/os_linux.go:75
       addr = (unreadable could not find loclist entry at 0x11abea for address 0x436f2f)
       val = (unreadable could not find loclist entry at 0x11ac1d for address 0x436f2f)
       ns = (unreadable could not find loclist entry at 0x11ac50 for address 0x436f2f)
       ts = runtime.timespec {tv_sec: 60, tv_nsec: 0}

2  0x000000000040f0a5 in runtime.notetsleep_internal
   at runtime/lock_futex.go:202
       n = (*runtime.note)(0x48b78f8)
       ns = 60000000000
       ~r0 = (unreadable empty OP stack)
       gp = (*runtime.g)(0xc0000069c0)
       deadline = 7085819801755795

3  0x000000000040f1f4 in runtime.notetsleep
   at runtime/lock_futex.go:225
       n = (unreadable could not find loclist entry at 0xda2e4 for address 0x40f1f4)
       ns = (unreadable could not find loclist entry at 0xda317 for address 0x40f1f4)
       ~r0 = (unreadable empty OP stack)
       gp = (unreadable could not find loclist entry at 0xda34a for address 0x40f1f4)

4  0x00000000004487a9 in runtime.sysmon
   at runtime/proc.go:5181
       lasttrace = 0
       idle = 33654
       delay = 10000
       now = 7085759801753680
       ~R0 = (unreadable empty OP stack)
       syscallWake = false
       next = 7086038586817785
       sleep = (unreadable could not find loclist entry at 0x12d019 for address 0x4487a9)

5  0x00000000004400b3 in runtime.mstart1
   at runtime/proc.go:1425
       _g_ = (unreadable could not find loclist entry at 0x123997 for address 0x4400b3)
       fn = (unreadable could not find loclist entry at 0x1239ca for address 0x4400b3)

6  0x000000000043fff9 in runtime.mstart0
   at runtime/proc.go:1382
       _g_ = (*runtime.g)(0xc0000069c0)
       osStack = (unreadable could not find loclist entry at 0x123930 for address 0x43fff9)

7  0x000000000046d545 in runtime.mstart
   at runtime/asm_amd64.s:391

8  0x000000000221eaf2 in ???
   at ?:-1

So these futex waits above may be more of a symptom than a cause, but its noteworthy as the first thing that will probably catch your eye when looking at the system calls.

Epoll Wait Seems Dead

open files

# lsof -p 96113
COMMAND   PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
nomad   96113 root  cwd       DIR                8,1      224         64 /
nomad   96113 root  rtd       DIR                8,1      224         64 /
nomad   96113 root  txt       REG                8,1 94540152    8390294 /usr/local/bin/nomad
nomad   96113 root  mem       REG                8,1  2156592       6847 /usr/lib64/libc-2.17.so
nomad   96113 root  mem       REG                8,1   142144       6879 /usr/lib64/libpthread-2.17.so
nomad   96113 root  mem       REG                8,1   163312       6840 /usr/lib64/ld-2.17.so
nomad   96113 root    0r      CHR                1,3      0t0       1028 /dev/null
nomad   96113 root    1w     FIFO                0,9      0t0 1413111465 pipe
nomad   96113 root    2w     FIFO                0,9      0t0 1413111466 pipe
nomad   96113 root    3u     unix 0xffff89363c3df2c0      0t0 1413113722 /tmp/plugin1723546063
nomad   96113 root    4u  a_inode               0,10        0       7531 [eventpoll]       <-------------------
nomad   96113 root    5r     FIFO                0,9      0t0 1413113720 pipe
nomad   96113 root    6w     FIFO                0,9      0t0 1413113720 pipe
nomad   96113 root    7r     FIFO                0,9      0t0 1413113724 pipe
nomad   96113 root    8w     FIFO                0,9      0t0 1413113724 pipe
nomad   96113 root    9r     FIFO                0,9      0t0 1413113725 pipe
nomad   96113 root   10w     FIFO                0,9      0t0 1413113725 pipe
nomad   96113 root   11u     unix 0xffff893b93d461c0      0t0 1413115579 /tmp/plugin1723546063
nomad   96113 root   12w     FIFO               8,32      0t0   43574314 /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.kamino-deluxecheckreordersso-prd.stdout.fifo
nomad   96113 root   13w     FIFO               8,32      0t0   43574316 /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.kamino-deluxecheckreordersso-prd.stderr.fifo
nomad   96113 root   14u     unix 0xffff893a69823740      0t0 1413115581 socket        <-----------------

list of go routines

(dlv) grs
  Goroutine 1 - User: github.com/hashicorp/go-plugin@v1.4.3/server.go:469 github.com/hashicorp/go-plugin.Serve (0x9721d7) [select]
  Goroutine 2 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [force gc (idle)]
  Goroutine 3 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [GC sweep wait]
  Goroutine 4 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [GC scavenge wait]
  Goroutine 5 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [finalizer wait]
  Goroutine 18 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_broker.go:121 github.com/hashicorp/go-plugin.(*gRPCBrokerServer).Recv (0x967ec7) [select]
  Goroutine 19 - User: os/file_posix.go:31 os.(*File).Read (0x4ee5fe) [IO wait]
  Goroutine 20 - User: os/file_posix.go:31 os.(*File).Read (0x4ee5fe) [IO wait]
  Goroutine 21 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [select]
  Goroutine 23 - User: google.golang.org/grpc@v1.42.0/internal/transport/controlbuf.go:407 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x86d9f5) [select]
  Goroutine 24 - User: google.golang.org/grpc@v1.42.0/internal/transport/http2_server.go:1119 google.golang.org/grpc/internal/transport.(*http2Server).keepalive (0x88ba93) [select]
  Goroutine 25 - User: net/fd_posix.go:55 net.(*netFD).Read (0x632169) [IO wait]
  Goroutine 26 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_stdio.go:58 github.com/hashicorp/go-plugin.(*grpcStdioServer).StreamStdio (0x96c05d) [select]
  Goroutine 27 - User: google.golang.org/grpc@v1.42.0/internal/transport/transport.go:178 google.golang.org/grpc/internal/transport.(*recvBufferReader).read (0x890330) [select]
  Goroutine 29 - User: net/fd_posix.go:55 net.(*netFD).Read (0x632169) [IO wait]
  Goroutine 34 - User: runtime/sigqueue.go:152 os/signal.signal_recv (0x46bb0f) (thread 96123)
  Goroutine 35 - User: github.com/hashicorp/go-plugin@v1.4.3/server.go:434 github.com/hashicorp/go-plugin.Serve.func3 (0x9727f5) [chan receive]
  Goroutine 36 - User: net/fd_unix.go:172 net.(*netFD).accept (0x634275) [IO wait]
  Goroutine 37 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_broker.go:70 github.com/hashicorp/go-plugin.(*gRPCBrokerServer).StartStream.func1 (0x967b05) [select]
  Goroutine 50 - User: github.com/fsouza/go-dockerclient@v1.6.5/client.go:580 github.com/fsouza/go-dockerclient.(*Client).streamURL.func1 (0xb1c5bb) [chan receive]

go routine thats doing the work of getting stream data out of socket ..... I think....

(dlv) stack -full
 0  0x000000000043d816 in runtime.gopark
    at runtime/proc.go:364
        unlockf = (unreadable could not find loclist entry at 0x12103b for address 0x43d816)
        lock = (unreadable could not find loclist entry at 0x12106e for address 0x43d816)
        reason = (unreadable could not find loclist entry at 0x1210a1 for address 0x43d816)
        traceEv = (unreadable could not find loclist entry at 0x1210d4 for address 0x43d816)
        traceskip = (unreadable could not find loclist entry at 0x121107 for address 0x43d816)
        mp = (unreadable could not find loclist entry at 0x12113a for address 0x43d816)
        gp = (unreadable could not find loclist entry at 0x12116d for address 0x43d816)
        status = (unreadable could not find loclist entry at 0x1211a0 for address 0x43d816)

 1  0x0000000000436297 in runtime.netpollblock
    at runtime/netpoll.go:526
        pd = (unreadable could not find loclist entry at 0x1197b3 for address 0x436297)
        mode = (unreadable could not find loclist entry at 0x1197e6 for address 0x436297)
        waitio = (unreadable could not find loclist entry at 0x119819 for address 0x436297)
        ~r0 = (unreadable empty OP stack)
        gpp = ("*runtime/internal/atomic.Uintptr")(0x7fcf1db09850)

 2  0x0000000000469689 in internal/poll.runtime_pollWait
    at runtime/netpoll.go:305
        pd = (*runtime.pollDesc)(0x7fcf1db09838)
        mode = 114
        ~r0 = (unreadable empty OP stack)
        errcode = (unreadable could not find loclist entry at 0x15906b for address 0x469689)

 3  0x00000000004e1a92 in internal/poll.(*pollDesc).wait
    at internal/poll/fd_poll_runtime.go:84
        pd = (unreadable could not find loclist entry at 0x26f604 for address 0x4e1a92)
        mode = (unreadable could not find loclist entry at 0x26f637 for address 0x4e1a92)
        isFile = false
        ~r0 = (unreadable empty OP stack)

 4  0x00000000004e2dfa in internal/poll.(*pollDesc).waitRead
    at internal/poll/fd_poll_runtime.go:89
        isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)

 5  0x00000000004e2dfa in internal/poll.(*FD).Read
    at internal/poll/fd_unix.go:167
        fd = ("*internal/poll.FD")(0xc0004b2000)
        p = []uint8 len: 4096, cap: 4096, [...]
        ~r0 = (unreadable empty OP stack)
        ~r0 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)

 6  0x0000000000632169 in net.(*netFD).Read
    at net/fd_posix.go:55
        fd = (*net.netFD)(0xc0004b2000)
        p = (unreadable could not find loclist entry at 0x4e8ba6 for address 0x632169)
        n = (unreadable empty OP stack)
        err = (unreadable empty OP stack)

 7  0x0000000000646d65 in net.(*conn).Read
Sending output to pager...
 0  0x000000000043d816 in runtime.gopark
    at runtime/proc.go:364
        unlockf = (unreadable could not find loclist entry at 0x12103b for address 0x43d816)
        lock = (unreadable could not find loclist entry at 0x12106e for address 0x43d816)
        reason = (unreadable could not find loclist entry at 0x1210a1 for address 0x43d816)
        traceEv = (unreadable could not find loclist entry at 0x1210d4 for address 0x43d816)
        traceskip = (unreadable could not find loclist entry at 0x121107 for address 0x43d816)
        mp = (unreadable could not find loclist entry at 0x12113a for address 0x43d816)
        gp = (unreadable could not find loclist entry at 0x12116d for address 0x43d816)
        status = (unreadable could not find loclist entry at 0x1211a0 for address 0x43d816)

 1  0x0000000000436297 in runtime.netpollblock
    at runtime/netpoll.go:526
        pd = (unreadable could not find loclist entry at 0x1197b3 for address 0x436297)
        mode = (unreadable could not find loclist entry at 0x1197e6 for address 0x436297)
        waitio = (unreadable could not find loclist entry at 0x119819 for address 0x436297)
        ~r0 = (unreadable empty OP stack)
        gpp = ("*runtime/internal/atomic.Uintptr")(0x7fcf1db09850)

 2  0x0000000000469689 in internal/poll.runtime_pollWait
    at runtime/netpoll.go:305
        pd = (*runtime.pollDesc)(0x7fcf1db09838)
        mode = 114
        ~r0 = (unreadable empty OP stack)
        errcode = (unreadable could not find loclist entry at 0x15906b for address 0x469689)

 3  0x00000000004e1a92 in internal/poll.(*pollDesc).wait
    at internal/poll/fd_poll_runtime.go:84
        pd = (unreadable could not find loclist entry at 0x26f604 for address 0x4e1a92)
        mode = (unreadable could not find loclist entry at 0x26f637 for address 0x4e1a92)
        isFile = false
        ~r0 = (unreadable empty OP stack)

 4  0x00000000004e2dfa in internal/poll.(*pollDesc).waitRead
    at internal/poll/fd_poll_runtime.go:89
        isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)

 5  0x00000000004e2dfa in internal/poll.(*FD).Read
    at internal/poll/fd_unix.go:167
        fd = ("*internal/poll.FD")(0xc0004b2000)
        p = []uint8 len: 4096, cap: 4096, [...]
        ~r0 = (unreadable empty OP stack)
        ~r0 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)

 6  0x0000000000632169 in net.(*netFD).Read
    at net/fd_posix.go:55
        fd = (*net.netFD)(0xc0004b2000)
        p = (unreadable could not find loclist entry at 0x4e8ba6 for address 0x632169)
        n = (unreadable empty OP stack)
        err = (unreadable empty OP stack)

 7  0x0000000000646d65 in net.(*conn).Read
    at net/net.go:183
        c = (*net.conn)(0xc00048c010)
        b = (unreadable could not find loclist entry at 0x50570e for address 0x646d65)
        ~r0 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)

 8  0x0000000000594cff in bufio.(*Reader).fill
    at bufio/bufio.go:106
        b = (*bufio.Reader)(0xc0004881e0)
        i = 100

 9  0x00000000005958ef in bufio.(*Reader).ReadSlice
    at bufio/bufio.go:372
        b = (*bufio.Reader)(0xc0004881e0)
        delim = 10
        line = (unreadable empty OP stack)
        err = (unreadable empty OP stack)
        s = (unreadable could not find loclist entry at 0x35c21e for address 0x5958ef)
        ~R0 = (unreadable empty OP stack)

10  0x00000000007b45a5 in net/http/internal.readChunkLine
    at net/http/internal/chunked.go:129
        b = (unreadable could not find loclist entry at 0x710c2f for address 0x7b45a5)
        ~r0 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)

11  0x00000000007b4008 in net/http/internal.(*chunkedReader).beginChunk
    at net/http/internal/chunked.go:48
        cr = ("*net/http/internal.chunkedReader")(0xc0004508a0)
        line = (unreadable could not read 8 bytes from register 0 (size: 0))

12  0x00000000007b42d6 in net/http/internal.(*chunkedReader).Read
    at net/http/internal/chunked.go:98
        cr = ("*net/http/internal.chunkedReader")(0xc0004508a0)
        b = []uint8 len: 32777, cap: 32777, [...]
        n = (unreadable empty OP stack)
        err = (unreadable empty OP stack)

13  0x000000000081195c in net/http.(*body).readLocked
    at net/http/transfer.go:846
        b = ("*net/http.body")(0xc00011ed40)
        p = (unreadable could not find loclist entry at 0x779259 for address 0x81195c)
        n = (unreadable empty OP stack)
        err = (unreadable empty OP stack)

14  0x0000000000811825 in net/http.(*body).Read
    at net/http/transfer.go:838
        b = (unreadable could not find loclist entry at 0x77907b for address 0x811825)
        p = (unreadable could not find loclist entry at 0x7790ae for address 0x811825)
        n = 0
        err = error nil

15  0x0000000000a9a484 in github.com/docker/docker/pkg/stdcopy.StdCopy
    at github.com/docker/docker@v17.12.0-ce-rc1.0.20200330121334-7f8b4b621b5d+incompatible/pkg/stdcopy/stdcopy.go:108
        dstout = io.Writer(*os.File) 0xbeef000000000208
        dsterr = io.Writer(*os.File) 0xbeef000000000308
        src = io.Reader(*net/http.body) 0xbeef000000000408
        written = (unreadable empty OP stack)
        err = (unreadable empty OP stack)
        buf = []uint8 len: 32777, cap: 32777, [...]
        nr = 0
        nw = (unreadable could not find loclist entry at 0x12d351a for address 0xa9a484)
        er = (unreadable could not read 8 bytes from register 3 (size: 0))
        ew = (unreadable could not find loclist entry at 0x12d3844 for address 0xa9a484)
        out = (unreadable could not find loclist entry at 0x12d38c1 for address 0xa9a484)
        frameSize = (unreadable empty OP stack)
        nr2 = (unreadable read out of bounds)

16  0x0000000000b1c74f in github.com/fsouza/go-dockerclient.handleStreamResponse
    at github.com/fsouza/go-dockerclient@v1.6.5/client.go:646
        resp = ("*net/http.Response")(0xc00049e1b0)
        streamOptions = ("*github.com/fsouza/go-dockerclient.streamOptions")(0xc0004afc08)
        ~r0 = (unreadable empty OP stack)
        err = (unreadable could not read 8 bytes from register 3 (size: 0))

17  0x0000000000b1c265 in github.com/fsouza/go-dockerclient.(*Client).streamURL
    at github.com/fsouza/go-dockerclient@v1.6.5/client.go:630
        c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
        method = "GET"
        url = "/containers/efcf31e8a935763808de3f0447749041883806b14f200f273565...+53 more"
        streamOptions = github.com/fsouza/go-dockerclient.streamOptions {setRawTerminal: false, rawJSONStream: false, useJSONDecoder: false,...+8 more}
        ~r0 = (unreadable empty OP stack)
        ~r0 = (unreadable empty OP stack)
        ~R0 = (unreadable empty OP stack)
        req = ("*net/http.Request")(0xc0004a2000)
        err = (unreadable could not read 8 bytes from register 0 (size: 0))
        resp = ("*net/http.Response")(0xc00049e1b0)
        protocol = "unix"
        address = "/var/run/docker.sock"
        ctx = (unreadable could not find loclist entry at 0x13b68a4 for address 0xb1c265)
        cancelRequest = context.WithCancel.func1
        subCtx = context.Context(*context.cancelCtx) 0xbeef000000000a08
        canceled = 0

18  0x0000000000b1b3d6 in github.com/fsouza/go-dockerclient.(*Client).stream
    at github.com/fsouza/go-dockerclient@v1.6.5/client.go:531
        c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
        method = "GET"
        path = "/containers/efcf31e8a935763808de3f0447749041883806b14f200f273565...+53 more"
        streamOptions = github.com/fsouza/go-dockerclient.streamOptions {setRawTerminal: false, rawJSONStream: false, useJSONDecoder: false,...+8 more}
        ~r0 = (unreadable empty OP stack)

19  0x0000000000b26e32 in github.com/fsouza/go-dockerclient.(*Client).Logs
    at github.com/fsouza/go-dockerclient@v1.6.5/container.go:1548
        c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
        opts = github.com/fsouza/go-dockerclient.LogsOptions {Context: context.Context(*context.cancelCtx) ..., Container: "efcf31e8a935763808de3f0447749041883806b14f200f2735651b29be9b8537", OutputStream: io.Writer(*o
s.File) ...,...+9 more}
        ~r0 = (unreadable empty OP stack)
        path = (unreadable could not find loclist entry at 0x13c207f for address 0xb26e32)

20  0x0000000000b46fc6 in github.com/hashicorp/nomad/drivers/docker/docklog.(*dockerLogger).Start.func1
    at github.com/hashicorp/nomad/drivers/docker/docklog/docker_logger.go:108
        d = ("*github.com/hashicorp/nomad/drivers/docker/docklog.dockerLogger")(0xc000126370)
        opts = ("*github.com/hashicorp/nomad/drivers/docker/docklog.StartOpts")(0xc00011cc00)
        ctx = context.Context(*context.cancelCtx) 0xbeef000000000d08
        stdout = io.WriteCloser(*os.File) 0xbeef000000000e08
        stderr = io.WriteCloser(*os.File) 0xbeef000000000f08
        err = (unreadable could not find loclist entry at 0x13de9b3 for address 0xb46fc6)
        sinceTime = (unreadable could not find loclist entry at 0x13de7d2 for address 0xb46fc6)
        backoff = 0
        logOpts = github.com/fsouza/go-dockerclient.LogsOptions {Context: context.Context(*context.cancelCtx) ..., Container: "efcf31e8a935763808de3f0447749041883806b14f200f2735651b29be9b8537", OutputStream: io.Writer
(*os.File) ...,...+9 more}

21  0x000000000046f801 in runtime.goexit
    at runtime/asm_amd64.s:1594

tracing the system calls for the socket

# time strace -e trace=read,write -e read=14 -e write=14 -p 96113 -f -y
strace: Process 96113 attached with 10 threads
^Cstrace: Process 96113 detached
strace: Process 96118 detached
strace: Process 96119 detached
strace: Process 96120 detached
strace: Process 96121 detached
strace: Process 96122 detached
strace: Process 96123 detached
strace: Process 96124 detached
strace: Process 96125 detached
strace: Process 96126 detached

real    3m28.705s
user    0m0.002s
sys 0m0.013s

In the above output we should expect to see reads from the socket for the incoming data on the local socket that is communicating with the docker socket. Something like the below:

[pid 18506] read(14<socket:[157156019]>, "\n\1\0\0\0\0\0\6B{\"date_time\":\"2023-05-3"..., 4092) = 4092

Furthermore the socket is established with no recv or send queues:

Note The socket Inode is the same from the provided lsof output

# ss -nae  | grep 1413115581
u_str  ESTAB      0      0      /var/run/docker.sock 1413121781            * 1413115581          <->
u_str  ESTAB      0      0         * 1413115581            * 1413121781          <->

Proof that logmon is running and working just fine:


echo "I love socket programming\!" > /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.<alloc1>.stdout.fifo

nomad alloc logs -task -n 1 -tail 6f6e6e1a-cec5-4fd9-6c30-357bd01ab471 I love socket programming!


I'm going to briefly explain how I perceive this to be working. Based on my findings, but just know that this could be ignorant or just blatantly wrong.

*Code Stepthrough*

docker_logger -> github.com/fsouza/go-dockerclient -> (CONNECT) go/src/net/dial.go | dial method -> (REQUEST) go/src/net/http.go | do method -> (PIPE PUSH) [go/src/io/io.go | Copy method] OR [github.com/docker@v23.0.3+incompatible/stdcopy/stdcopy.go | StdCopy method]

Somewhere in the above flow, something goes awry and some race condition can happen. I am still working through this issue and will hopefully be able to provide more to come, but my best guess right now is some unhandled problem in the dockerclient, or something much deeper in the stack than i'm currently comfortable to give input on. What I do know is that there are no 'read' syscalls coming to the socket and to my knowledge, with the information in the traces, the socket should work like this:

`Epoll_pwait` is basically a waiter that listens to file descriptors and  there i/o so it can wake up processes to get the data when its available.

When data is available AKA event, it should notify the process('s), and allow that data to be ingested. Either epoll_pwait is not working correctly to let processes know that data is coming through (less likely) or no data is coming through due to some unknown socket error (more likely).  

The nomad client details

[dhabedank@a1v-nom-wdg-pl-0b032036f21efeea9]~% nomad node status -self -verbose ID = d98b628d-8e56-24c8-42da-1a85a3d88fad Name = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local Class = dockerwedge-prod DC = austin-production Drain = false Eligibility = eligible Status = ready CSI Controllers = CSI Drivers = Uptime = 1966h56m4s

Drivers Driver Detected Healthy Message Time docker true true Healthy 2023-05-14T22:34:00-05:00 exec true true Healthy 2023-03-09T12:24:05-06:00 java false false 2023-03-09T12:24:05-06:00 qemu false false 2023-03-09T12:24:05-06:00 raw_exec false false disabled 2023-03-09T12:24:05-06:00

Node Events Time Subsystem Message Details 2023-05-14T22:34:00-05:00 Driver: docker Healthy driver: docker 2023-05-14T22:33:30-05:00 Driver: docker Failed to connect to docker daemon driver: docker 2023-03-09T12:24:07-06:00 Cluster Node re-registered 2023-03-09T12:20:58-06:00 Cluster Node heartbeat missed 2023-03-09T12:18:20-06:00 Drain Node drain complete 2023-03-09T12:18:16-06:00 Drain Node drain strategy set 2023-03-09T09:15:06-06:00 Cluster Node re-registered 2023-03-09T09:14:16-06:00 Cluster Node heartbeat missed 2023-03-09T07:14:20-06:00 Cluster Node re-registered 2023-03-09T05:11:46-06:00 Cluster Node heartbeat missed

Allocated Resources CPU Memory Disk 29324/47484 MHz 31 GiB/31 GiB 24 GiB/59 GiB

Allocation Resource Utilization CPU Memory 6164/47484 MHz 5.1 GiB/31 GiB

Host Resource Utilization CPU Memory Disk 17432/47984 MHz 16 GiB/31 GiB 1.7 GiB/60 GiB

Allocations

Attributes consul.connect = false consul.datacenter = aus consul.ft.namespaces = true consul.grpc = -1 consul.revision = 5cfef2ff consul.segment = wedge consul.server = false consul.sku = ent consul.version = 1.10.10+ent cpu.arch = amd64 cpu.frequency = 2999 cpu.modelname = Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz cpu.numcores = 16 cpu.totalcompute = 47984 driver.docker = 1 driver.docker.bridge_ip = 172.17.0.1 driver.docker.os_type = linux driver.docker.privileged.enabled = true driver.docker.runtimes = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc driver.docker.version = 20.10.17 driver.docker.volumes.enabled = true driver.exec = 1 kernel.name = linux kernel.version = 3.10.0-1160.76.1.el7.x86_64 memory.totalbytes = 33547882496 nomad.advertise.address = 10.127.74.96:4646 nomad.revision = 168e70a5b75d21883edb8a89ca879cf6b7825787 nomad.version = 1.2.12+ent os.name = centos os.signals = SIGALRM,SIGBUS,SIGXFSZ,SIGPROF,SIGSEGV,SIGFPE,SIGINT,SIGTERM,SIGABRT,SIGHUP,SIGTSTP,SIGTTOU,SIGUSR1,SIGIO,SIGKILL,SIGTRAP,SIGIOT,SIGSTOP,SIGTTIN,SIGWINCH,SIGCHLD,SIGQUIT,SIGURG,SIGUSR2,SIGCONT,SIGILL,SIGPIPE,SIGSYS,SIGXCPU os.version = 7.9.2009 unique.cgroup.mountpoint = /sys/fs/cgroup/systemd unique.consul.name = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local unique.hostname = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local unique.network.ip-address = 10.127.74.96 unique.storage.bytesfree = 64313729024 unique.storage.bytestotal = 64414023680 unique.storage.volume = /dev/sdc vault.accessible = true vault.cluster_id = 7d8266df-31f5-54b7-b3a1-216ab25adb2d vault.cluster_name = vault-cluster-81044b06 vault.version = 1.11.5+ent

Meta connect.gateway_image = envoyproxy/envoy:v${NOMAD_envoy_version} connect.log_level = info connect.proxy_concurrency = 1 connect.sidecar_image = envoyproxy/envoy:v${NOMAD_envoy_version}

flipyap commented 1 year ago

Hey All,

It is me back again with some more ted talk. Under further scrutiny, this is confirmed to be some race condition with docker. I was able to proof this out by using curl in a screen session and watching it freeze and hang just as the nomad logging did. Check out the test below:

Command

# curl --silent -XGET --unix-socket /run/docker.sock "http://localhost/v1.41/containers/<CONTAINER_ID>/logs?stdout=true&follow=true"

In Failure Loop

# strace -ttt -p 63675 -s655535 -f -y
strace: Process 63675 attached
1685943750.487366 restart_syscall(<... resuming interrupted read ...>) = 0
1685943750.533690 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943750.535456 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943751.537527 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943751.539618 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943752.541069 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943752.541903 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943753.547484 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943753.548418 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943754.556300 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943754.557633 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000^Cstrace: Process 63675 detached

Below is the relevant docker version information:

Client: Docker Engine - Community
 Version:           20.10.18
 API version:       1.41
 Go version:        go1.18.6
 Git commit:        b40c2f6
 Built:             Thu Sep  8 23:14:08 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:03:33 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I've also confirmed that this issue is specifically correlated with the docker log rotation. Looking into the docker github history it does look like docker has struggled with this issue for some time, but nothing specific to linux that I could see. I did notice that recently they moved away from using file watchers for log following which may likely have improved or corrected this issue, but I haven't battle tested or reviewed the code changes to give anything insightful at this point. If your following this and looking for a solution, I'd suggest going to the latest docker and testing if you can.

dpn commented 1 year ago

@flipyap Hey first off thanks for digging into this. Its been long enough that I haven't been keeping close tabs on this but we seem to have had another reproduction of this reported by one of our service teams today. In this case we're running a newer version of docker than your latest post:

$ docker version
Client: Docker Engine - Community
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:39:02 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:38:05 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

So seems like the issue persists. Unfortunately I dont have much more context around this at this point than I did when I originally filed this issue.