Open dpn opened 4 years ago
Hi @dpn! When this happens are both the log shipping processes still running or have they crashed? This will be nomad logmon
and nomad docker_logger
for each running job.
Thanks for the response @tgross. I'm trying to find the time to throw some scripts together to track down containers in this state to see if this is reproducing on 0.10.x in our setup and once that's complete I can leverage those to answer that question. Will follow up with what I find.
I seem to be seeing this behaviour as well. I'm not sure how to correlate logmon
and docker_logger
to the respective containers but I have a whole heap of them running. And this is happening on a freshly started container.
Docker 19.03.5 Nomad v0.11.2
same problem here Docker v19.03.12 Nomad v0.12.0
Same problem here: Docker v20.10.2 Nomad v0.12.7
I find that it only seems to be an issue for jobs that send a whole lot of logs very rapidly. In my case there are 16 docker containers running, 17 instances of nomad docker logmon
, and 15 instances of nomad docker_logger
As an additional note. this seems to have been resolved in nomad v1.0.2. I upgraded, and am no longer seeing this.
Update to my previous comment. It seems that this issue is still present in 1.0.2. It's just a whole lot harder to trigger. I was, however, able to trigger it by running testing that generated many many log lines. I do still see logs streaming from docker logs
so it definitely seems like a nomad thing.
A few more pieces of information here:
docker logs
command so logs are definitely being produced, just not being handled by nomad.logmon
and 2 instances of docker_logger
. Is that normal? One for stdout
and one for stderr
?logmon
, 29 instances of docker_logger
, and only 12 containers. So even if we need two of each for each job, the numbers don't work out.docker_logger
process, but does not restart the logmon
process.The last two points in the above points to the docker_logger
process is the source of the issue.
Thanks for the added info @dansteen
For Docker tasks, there should be exactly one logmon
and one docker_logger
per task. For non-Docker tasks, just the logmon
. So having extraneous processes are a pretty clear signal. One of the things that might help narrow things down is if you can determine whether the extra logmon
/docker_logger
processes are for running allocations vs ones that have already stopped. (From their process tree.)
Another thing that can help narrow the investigation: has the Nomad agent been restarted since the allocations with extra loggers were originally created?
Have the same issue: Docker v19.03.5 Nomad v1.0.2 OS Windows v10.0.17763 Build 17763
Hi @tgross, thanks for the response. It looks like the extra processes are for allocations that have already stopped. The nomad agent was not restarted prior to the extra loggers being created. (however it has since been).
hi @tgross I now have an application that stopped logging, and there are no extra logmon or docker_logger processes on the box. So it seems like it's not related to that. Also, the nomad client has not been restarted since the box was created so it's also not going to be related to that.
@tgross Sorry for the message spam. One bit of information that I think may be important here. By default docker (on debian 10) is configured to only allow 2 MB log files. In our case, we have an application that can log 1MB of data in a single line. When this happens that docker log file will be rotated every two messages. I can definitely see the nomad logger not following along for an event like that. I've upped the log file size in docker as a test, and will report back the results.
Update: As a note, it turns out that setting the default options for docker is not sufficient since nomad overrides those per this: https://www.nomadproject.io/docs/drivers/docker#logging I needed to set the settings in the job task as well.
We are seeing this as well on Nomad Version 1.1.5 We're also running docker on Windows
It works intermittently, Some servers seem to show logs, others don't I haven't been able to narrow it to a specific server config or anything
We have 5 Servers, 4 of which support Docker and 14 Jobs running upto 4 allocations per job, So there's 43 allocations total, So keeping track of what is and isn't working is a little bit of hassle.
In the meantime we've updated 80% of our apps to just log to an external service LogTail. Which is easy and reliable... But it would be nice to have a look at them on the server that caused them.
Nomad Client Details os.name | Microsoft Windows Server 2019 Standard os.signals | SIGABRT,SIGBUS,SIGFPE,SIGHUP,SIGKILL,SIGQUIT,SIGSEGV,SIGALRM,SIGILL,SIGINT,SIGPIPE,SIGTERM,SIGTRAP os.version | 10.0.17763 Build 17763 cpu.arch | amd64 cpu.numcores | 12 cpu.frequency | 2893 cpu.modelname | Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz cpu.totalcompute | 34716 kernel.name | windows kernel.version | 10.0.17763 Build 17763 driver.docker driver.docker.os_type | windows driver.docker.version | 20.10.7 driver.docker.runtimes | nomad.version | 1.1.5 nomad.revision | 117a23d2cdf26a1837b21c84f30c8c0f3441e927
We're getting same issue on Nomad 1.2.5. Logs stop randomly getting written into Nomad logs while they are still getting produced by docker containers.
Nomad Client Details:
ID = xxxxxxxxxxxx
Name = ip-12-34-56-78
Class = linux
DC = uat-eu-west-2
Drain = false
Eligibility = eligible
Status = ready
CSI Controllers = <none>
CSI Drivers = aws-ebs0
Uptime = 722h50m30s
Drivers
Driver Detected Healthy Message Time
docker true true Healthy 2022-05-09T09:14:54Z
java false false <none> 2022-05-09T09:14:54Z
qemu false false <none> 2022-05-09T09:14:54Z
raw_exec false false disabled 2022-05-09T09:14:54Z
Node Events
Time Subsystem Message Details
2022-05-09T09:14:54Z Cluster Node registered <none>
Allocated Resources
CPU Memory Disk
3700/24800 MHz 30 GiB/30 GiB 2.3 GiB/45 GiB
Allocation Resource Utilization
CPU Memory
349/24800 MHz 5.7 GiB/30 GiB
Host Resource Utilization
CPU Memory Disk
1557/24800 MHz 7.3 GiB/30 GiB (/dev/mapper/root_vg-root)
Allocations
ID Eval ID Node ID Task Group Version Desired Status Created Modified
...........
Attributes
consul.connect = false
consul.datacenter = uat-eu-west-2
consul.ft.namespaces = false
consul.grpc = -1
consul.revision = 37c7d06b
consul.server = false
consul.sku = oss
consul.version = 1.11.2
cpu.arch = amd64
cpu.frequency = 3100
cpu.modelname = Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
cpu.numcores = 8
cpu.totalcompute = 24800
driver.docker = 1
driver.docker.bridge_ip = 172.17.0.1
driver.docker.os_type = linux
driver.docker.privileged.enabled = true
driver.docker.runtimes = docker-runc,runc
driver.docker.version = 1.13.1
kernel.name = linux
kernel.version = 3.10.0-1160.59.1.el7.x86_64
memory.totalbytes = 32727478272
nomad.advertise.address = 12.34.56.78:4646
nomad.revision = 06d912a20ba029c7f4d6b371cd07594cba3ae3cd
nomad.version = 1.2.5
os.name = redhat
os.signals = SIGUSR2,SIGKILL,SIGPIPE,SIGTERM,SIGURG,SIGIO,SIGQUIT,SIGBUS,SIGHUP,SIGINT,SIGTRAP,SIGTTIN,SIGWINCH,SIGABRT,SIGALRM,SIGXCPU,SIGCHLD,SIGFPE,SIGTSTP,SIGUSR1,SIGSEGV,SIGSYS,SIGTTOU,SIGCONT,SIGILL,SIGSTOP,SIGXFSZ,SIGIOT,SIGPROF
os.version = 7.9
platform.aws.ami-id = ami-0a8dbe56d340ad2d8
platform.aws.instance-type = m5.2xlarge
platform.aws.placement.availability-zone = eu-west-2a
unique.cgroup.mountpoint = /sys/fs/cgroup/systemd
unique.consul.name = ip-12-34-56-78
unique.hostname = ip-12-34-56-78.domain.name
unique.network.ip-address = 12.34.56.78
unique.platform.aws.hostname = ip-12-34-56-78.domain.name
unique.platform.aws.instance-id = i-00255359dab3c21b9
unique.platform.aws.local-hostname = ip-12-34-56-78.domain.name
unique.platform.aws.local-ipv4 = 12.34.56.78
unique.platform.aws.mac = 06:fd:b8:ce:75:e0
unique.storage.bytesfree = 48213946368
unique.storage.bytestotal = 53135540224
unique.storage.volume = /dev/mapper/root_vg-root
vault.accessible = true
vault.cluster_id = 96b36453-b950-2a86-3138-c2a66c408e30
vault.cluster_name = uat
vault.version = 1.6.3
Meta
connect.gateway_image = envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level = info
connect.proxy_concurrency = 1
connect.sidecar_image = envoyproxy/envoy:v${NOMAD_envoy_version}
My team and I have been experiencing this issue at random and for us it is production impacting, because we utilize a homegrown logging sidecar that reads logs from the nomad stdout/stderr files.
I've been slowly gathering information along the way, but any advice or pointers on where to go next would be appreciated.
First, a little architectural overview. I've done my best to convey a small 1000 ft view of how the specific nomad pieces work together.
When this phenomenon occurs, there are a few interesting things that I notice...
The first interesting tidbit of information I notice is that there are futex waits that are failing every 60 seconds. Tracking these futexs down by thread seems to point back to them being owned by golangs system monitor "sysmon"
# strace -ttt -p 96113 -s655535 -f -y
strace: Process 96113 attached with 10 threads
[pid 96126] 1685460402.610901 futex(0xc000095548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96125] 1685460402.610963 futex(0xc00030e548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96124] 1685460402.610982 futex(0xc00030e148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96123] 1685460402.611000 futex(0x48ebfa0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96122] 1685460402.611018 futex(0x48ebe58, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96121] 1685460402.611036 futex(0xc000100148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96120] 1685460402.611053 futex(0xc000094d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96119] 1685460402.611072 epoll_pwait(4<anon_inode:[eventpoll]>, <unfinished ...>
[pid 96118] 1685460402.611880 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 96113] 1685460402.611907 futex(0x48b3fa8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 96119] 1685460402.611930 <... epoll_pwait resumed>[], 128, 0, NULL, 7085560) = 0
[pid 96119] 1685460402.612007 epoll_pwait(4<anon_inode:[eventpoll]>, <unfinished ...>
[pid 96118] 1685460422.053096 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460422.054926 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460422.065794 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460482.067844 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460482.078684 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460542.080197 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460542.090805 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}) = -1 ETIMEDOUT (Connection timed out)
[pid 96118] 1685460602.093027 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 96118] 1685460602.103931 futex(0x48b78f8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}
That thread looks like this:
(dlv) thread 96118
Switched from 96113 to 96118
(dlv) stack -full
0 0x0000000000471703 in runtime.futex
at runtime/sys_linux_amd64.s:560
1 0x0000000000436f2f in runtime.futexsleep
at runtime/os_linux.go:75
addr = (unreadable could not find loclist entry at 0x11abea for address 0x436f2f)
val = (unreadable could not find loclist entry at 0x11ac1d for address 0x436f2f)
ns = (unreadable could not find loclist entry at 0x11ac50 for address 0x436f2f)
ts = runtime.timespec {tv_sec: 60, tv_nsec: 0}
2 0x000000000040f0a5 in runtime.notetsleep_internal
at runtime/lock_futex.go:202
n = (*runtime.note)(0x48b78f8)
ns = 60000000000
~r0 = (unreadable empty OP stack)
gp = (*runtime.g)(0xc0000069c0)
deadline = 7085819801755795
3 0x000000000040f1f4 in runtime.notetsleep
at runtime/lock_futex.go:225
n = (unreadable could not find loclist entry at 0xda2e4 for address 0x40f1f4)
ns = (unreadable could not find loclist entry at 0xda317 for address 0x40f1f4)
~r0 = (unreadable empty OP stack)
gp = (unreadable could not find loclist entry at 0xda34a for address 0x40f1f4)
4 0x00000000004487a9 in runtime.sysmon
at runtime/proc.go:5181
lasttrace = 0
idle = 33654
delay = 10000
now = 7085759801753680
~R0 = (unreadable empty OP stack)
syscallWake = false
next = 7086038586817785
sleep = (unreadable could not find loclist entry at 0x12d019 for address 0x4487a9)
5 0x00000000004400b3 in runtime.mstart1
at runtime/proc.go:1425
_g_ = (unreadable could not find loclist entry at 0x123997 for address 0x4400b3)
fn = (unreadable could not find loclist entry at 0x1239ca for address 0x4400b3)
6 0x000000000043fff9 in runtime.mstart0
at runtime/proc.go:1382
_g_ = (*runtime.g)(0xc0000069c0)
osStack = (unreadable could not find loclist entry at 0x123930 for address 0x43fff9)
7 0x000000000046d545 in runtime.mstart
at runtime/asm_amd64.s:391
8 0x000000000221eaf2 in ???
at ?:-1
So these futex waits above may be more of a symptom than a cause, but its noteworthy as the first thing that will probably catch your eye when looking at the system calls.
open files
# lsof -p 96113
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nomad 96113 root cwd DIR 8,1 224 64 /
nomad 96113 root rtd DIR 8,1 224 64 /
nomad 96113 root txt REG 8,1 94540152 8390294 /usr/local/bin/nomad
nomad 96113 root mem REG 8,1 2156592 6847 /usr/lib64/libc-2.17.so
nomad 96113 root mem REG 8,1 142144 6879 /usr/lib64/libpthread-2.17.so
nomad 96113 root mem REG 8,1 163312 6840 /usr/lib64/ld-2.17.so
nomad 96113 root 0r CHR 1,3 0t0 1028 /dev/null
nomad 96113 root 1w FIFO 0,9 0t0 1413111465 pipe
nomad 96113 root 2w FIFO 0,9 0t0 1413111466 pipe
nomad 96113 root 3u unix 0xffff89363c3df2c0 0t0 1413113722 /tmp/plugin1723546063
nomad 96113 root 4u a_inode 0,10 0 7531 [eventpoll] <-------------------
nomad 96113 root 5r FIFO 0,9 0t0 1413113720 pipe
nomad 96113 root 6w FIFO 0,9 0t0 1413113720 pipe
nomad 96113 root 7r FIFO 0,9 0t0 1413113724 pipe
nomad 96113 root 8w FIFO 0,9 0t0 1413113724 pipe
nomad 96113 root 9r FIFO 0,9 0t0 1413113725 pipe
nomad 96113 root 10w FIFO 0,9 0t0 1413113725 pipe
nomad 96113 root 11u unix 0xffff893b93d461c0 0t0 1413115579 /tmp/plugin1723546063
nomad 96113 root 12w FIFO 8,32 0t0 43574314 /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.kamino-deluxecheckreordersso-prd.stdout.fifo
nomad 96113 root 13w FIFO 8,32 0t0 43574316 /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.kamino-deluxecheckreordersso-prd.stderr.fifo
nomad 96113 root 14u unix 0xffff893a69823740 0t0 1413115581 socket <-----------------
list of go routines
(dlv) grs
Goroutine 1 - User: github.com/hashicorp/go-plugin@v1.4.3/server.go:469 github.com/hashicorp/go-plugin.Serve (0x9721d7) [select]
Goroutine 2 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [force gc (idle)]
Goroutine 3 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [GC sweep wait]
Goroutine 4 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [GC scavenge wait]
Goroutine 5 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [finalizer wait]
Goroutine 18 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_broker.go:121 github.com/hashicorp/go-plugin.(*gRPCBrokerServer).Recv (0x967ec7) [select]
Goroutine 19 - User: os/file_posix.go:31 os.(*File).Read (0x4ee5fe) [IO wait]
Goroutine 20 - User: os/file_posix.go:31 os.(*File).Read (0x4ee5fe) [IO wait]
Goroutine 21 - User: runtime/proc.go:364 runtime.gopark (0x43d816) [select]
Goroutine 23 - User: google.golang.org/grpc@v1.42.0/internal/transport/controlbuf.go:407 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x86d9f5) [select]
Goroutine 24 - User: google.golang.org/grpc@v1.42.0/internal/transport/http2_server.go:1119 google.golang.org/grpc/internal/transport.(*http2Server).keepalive (0x88ba93) [select]
Goroutine 25 - User: net/fd_posix.go:55 net.(*netFD).Read (0x632169) [IO wait]
Goroutine 26 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_stdio.go:58 github.com/hashicorp/go-plugin.(*grpcStdioServer).StreamStdio (0x96c05d) [select]
Goroutine 27 - User: google.golang.org/grpc@v1.42.0/internal/transport/transport.go:178 google.golang.org/grpc/internal/transport.(*recvBufferReader).read (0x890330) [select]
Goroutine 29 - User: net/fd_posix.go:55 net.(*netFD).Read (0x632169) [IO wait]
Goroutine 34 - User: runtime/sigqueue.go:152 os/signal.signal_recv (0x46bb0f) (thread 96123)
Goroutine 35 - User: github.com/hashicorp/go-plugin@v1.4.3/server.go:434 github.com/hashicorp/go-plugin.Serve.func3 (0x9727f5) [chan receive]
Goroutine 36 - User: net/fd_unix.go:172 net.(*netFD).accept (0x634275) [IO wait]
Goroutine 37 - User: github.com/hashicorp/go-plugin@v1.4.3/grpc_broker.go:70 github.com/hashicorp/go-plugin.(*gRPCBrokerServer).StartStream.func1 (0x967b05) [select]
Goroutine 50 - User: github.com/fsouza/go-dockerclient@v1.6.5/client.go:580 github.com/fsouza/go-dockerclient.(*Client).streamURL.func1 (0xb1c5bb) [chan receive]
go routine thats doing the work of getting stream data out of socket ..... I think....
(dlv) stack -full
0 0x000000000043d816 in runtime.gopark
at runtime/proc.go:364
unlockf = (unreadable could not find loclist entry at 0x12103b for address 0x43d816)
lock = (unreadable could not find loclist entry at 0x12106e for address 0x43d816)
reason = (unreadable could not find loclist entry at 0x1210a1 for address 0x43d816)
traceEv = (unreadable could not find loclist entry at 0x1210d4 for address 0x43d816)
traceskip = (unreadable could not find loclist entry at 0x121107 for address 0x43d816)
mp = (unreadable could not find loclist entry at 0x12113a for address 0x43d816)
gp = (unreadable could not find loclist entry at 0x12116d for address 0x43d816)
status = (unreadable could not find loclist entry at 0x1211a0 for address 0x43d816)
1 0x0000000000436297 in runtime.netpollblock
at runtime/netpoll.go:526
pd = (unreadable could not find loclist entry at 0x1197b3 for address 0x436297)
mode = (unreadable could not find loclist entry at 0x1197e6 for address 0x436297)
waitio = (unreadable could not find loclist entry at 0x119819 for address 0x436297)
~r0 = (unreadable empty OP stack)
gpp = ("*runtime/internal/atomic.Uintptr")(0x7fcf1db09850)
2 0x0000000000469689 in internal/poll.runtime_pollWait
at runtime/netpoll.go:305
pd = (*runtime.pollDesc)(0x7fcf1db09838)
mode = 114
~r0 = (unreadable empty OP stack)
errcode = (unreadable could not find loclist entry at 0x15906b for address 0x469689)
3 0x00000000004e1a92 in internal/poll.(*pollDesc).wait
at internal/poll/fd_poll_runtime.go:84
pd = (unreadable could not find loclist entry at 0x26f604 for address 0x4e1a92)
mode = (unreadable could not find loclist entry at 0x26f637 for address 0x4e1a92)
isFile = false
~r0 = (unreadable empty OP stack)
4 0x00000000004e2dfa in internal/poll.(*pollDesc).waitRead
at internal/poll/fd_poll_runtime.go:89
isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)
5 0x00000000004e2dfa in internal/poll.(*FD).Read
at internal/poll/fd_unix.go:167
fd = ("*internal/poll.FD")(0xc0004b2000)
p = []uint8 len: 4096, cap: 4096, [...]
~r0 = (unreadable empty OP stack)
~r0 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)
6 0x0000000000632169 in net.(*netFD).Read
at net/fd_posix.go:55
fd = (*net.netFD)(0xc0004b2000)
p = (unreadable could not find loclist entry at 0x4e8ba6 for address 0x632169)
n = (unreadable empty OP stack)
err = (unreadable empty OP stack)
7 0x0000000000646d65 in net.(*conn).Read
Sending output to pager...
0 0x000000000043d816 in runtime.gopark
at runtime/proc.go:364
unlockf = (unreadable could not find loclist entry at 0x12103b for address 0x43d816)
lock = (unreadable could not find loclist entry at 0x12106e for address 0x43d816)
reason = (unreadable could not find loclist entry at 0x1210a1 for address 0x43d816)
traceEv = (unreadable could not find loclist entry at 0x1210d4 for address 0x43d816)
traceskip = (unreadable could not find loclist entry at 0x121107 for address 0x43d816)
mp = (unreadable could not find loclist entry at 0x12113a for address 0x43d816)
gp = (unreadable could not find loclist entry at 0x12116d for address 0x43d816)
status = (unreadable could not find loclist entry at 0x1211a0 for address 0x43d816)
1 0x0000000000436297 in runtime.netpollblock
at runtime/netpoll.go:526
pd = (unreadable could not find loclist entry at 0x1197b3 for address 0x436297)
mode = (unreadable could not find loclist entry at 0x1197e6 for address 0x436297)
waitio = (unreadable could not find loclist entry at 0x119819 for address 0x436297)
~r0 = (unreadable empty OP stack)
gpp = ("*runtime/internal/atomic.Uintptr")(0x7fcf1db09850)
2 0x0000000000469689 in internal/poll.runtime_pollWait
at runtime/netpoll.go:305
pd = (*runtime.pollDesc)(0x7fcf1db09838)
mode = 114
~r0 = (unreadable empty OP stack)
errcode = (unreadable could not find loclist entry at 0x15906b for address 0x469689)
3 0x00000000004e1a92 in internal/poll.(*pollDesc).wait
at internal/poll/fd_poll_runtime.go:84
pd = (unreadable could not find loclist entry at 0x26f604 for address 0x4e1a92)
mode = (unreadable could not find loclist entry at 0x26f637 for address 0x4e1a92)
isFile = false
~r0 = (unreadable empty OP stack)
4 0x00000000004e2dfa in internal/poll.(*pollDesc).waitRead
at internal/poll/fd_poll_runtime.go:89
isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)
5 0x00000000004e2dfa in internal/poll.(*FD).Read
at internal/poll/fd_unix.go:167
fd = ("*internal/poll.FD")(0xc0004b2000)
p = []uint8 len: 4096, cap: 4096, [...]
~r0 = (unreadable empty OP stack)
~r0 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
isFile = (unreadable could not find loclist entry at 0x270d6b for address 0x4e2dfa)
6 0x0000000000632169 in net.(*netFD).Read
at net/fd_posix.go:55
fd = (*net.netFD)(0xc0004b2000)
p = (unreadable could not find loclist entry at 0x4e8ba6 for address 0x632169)
n = (unreadable empty OP stack)
err = (unreadable empty OP stack)
7 0x0000000000646d65 in net.(*conn).Read
at net/net.go:183
c = (*net.conn)(0xc00048c010)
b = (unreadable could not find loclist entry at 0x50570e for address 0x646d65)
~r0 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
8 0x0000000000594cff in bufio.(*Reader).fill
at bufio/bufio.go:106
b = (*bufio.Reader)(0xc0004881e0)
i = 100
9 0x00000000005958ef in bufio.(*Reader).ReadSlice
at bufio/bufio.go:372
b = (*bufio.Reader)(0xc0004881e0)
delim = 10
line = (unreadable empty OP stack)
err = (unreadable empty OP stack)
s = (unreadable could not find loclist entry at 0x35c21e for address 0x5958ef)
~R0 = (unreadable empty OP stack)
10 0x00000000007b45a5 in net/http/internal.readChunkLine
at net/http/internal/chunked.go:129
b = (unreadable could not find loclist entry at 0x710c2f for address 0x7b45a5)
~r0 = (unreadable empty OP stack)
~r1 = (unreadable empty OP stack)
11 0x00000000007b4008 in net/http/internal.(*chunkedReader).beginChunk
at net/http/internal/chunked.go:48
cr = ("*net/http/internal.chunkedReader")(0xc0004508a0)
line = (unreadable could not read 8 bytes from register 0 (size: 0))
12 0x00000000007b42d6 in net/http/internal.(*chunkedReader).Read
at net/http/internal/chunked.go:98
cr = ("*net/http/internal.chunkedReader")(0xc0004508a0)
b = []uint8 len: 32777, cap: 32777, [...]
n = (unreadable empty OP stack)
err = (unreadable empty OP stack)
13 0x000000000081195c in net/http.(*body).readLocked
at net/http/transfer.go:846
b = ("*net/http.body")(0xc00011ed40)
p = (unreadable could not find loclist entry at 0x779259 for address 0x81195c)
n = (unreadable empty OP stack)
err = (unreadable empty OP stack)
14 0x0000000000811825 in net/http.(*body).Read
at net/http/transfer.go:838
b = (unreadable could not find loclist entry at 0x77907b for address 0x811825)
p = (unreadable could not find loclist entry at 0x7790ae for address 0x811825)
n = 0
err = error nil
15 0x0000000000a9a484 in github.com/docker/docker/pkg/stdcopy.StdCopy
at github.com/docker/docker@v17.12.0-ce-rc1.0.20200330121334-7f8b4b621b5d+incompatible/pkg/stdcopy/stdcopy.go:108
dstout = io.Writer(*os.File) 0xbeef000000000208
dsterr = io.Writer(*os.File) 0xbeef000000000308
src = io.Reader(*net/http.body) 0xbeef000000000408
written = (unreadable empty OP stack)
err = (unreadable empty OP stack)
buf = []uint8 len: 32777, cap: 32777, [...]
nr = 0
nw = (unreadable could not find loclist entry at 0x12d351a for address 0xa9a484)
er = (unreadable could not read 8 bytes from register 3 (size: 0))
ew = (unreadable could not find loclist entry at 0x12d3844 for address 0xa9a484)
out = (unreadable could not find loclist entry at 0x12d38c1 for address 0xa9a484)
frameSize = (unreadable empty OP stack)
nr2 = (unreadable read out of bounds)
16 0x0000000000b1c74f in github.com/fsouza/go-dockerclient.handleStreamResponse
at github.com/fsouza/go-dockerclient@v1.6.5/client.go:646
resp = ("*net/http.Response")(0xc00049e1b0)
streamOptions = ("*github.com/fsouza/go-dockerclient.streamOptions")(0xc0004afc08)
~r0 = (unreadable empty OP stack)
err = (unreadable could not read 8 bytes from register 3 (size: 0))
17 0x0000000000b1c265 in github.com/fsouza/go-dockerclient.(*Client).streamURL
at github.com/fsouza/go-dockerclient@v1.6.5/client.go:630
c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
method = "GET"
url = "/containers/efcf31e8a935763808de3f0447749041883806b14f200f273565...+53 more"
streamOptions = github.com/fsouza/go-dockerclient.streamOptions {setRawTerminal: false, rawJSONStream: false, useJSONDecoder: false,...+8 more}
~r0 = (unreadable empty OP stack)
~r0 = (unreadable empty OP stack)
~R0 = (unreadable empty OP stack)
req = ("*net/http.Request")(0xc0004a2000)
err = (unreadable could not read 8 bytes from register 0 (size: 0))
resp = ("*net/http.Response")(0xc00049e1b0)
protocol = "unix"
address = "/var/run/docker.sock"
ctx = (unreadable could not find loclist entry at 0x13b68a4 for address 0xb1c265)
cancelRequest = context.WithCancel.func1
subCtx = context.Context(*context.cancelCtx) 0xbeef000000000a08
canceled = 0
18 0x0000000000b1b3d6 in github.com/fsouza/go-dockerclient.(*Client).stream
at github.com/fsouza/go-dockerclient@v1.6.5/client.go:531
c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
method = "GET"
path = "/containers/efcf31e8a935763808de3f0447749041883806b14f200f273565...+53 more"
streamOptions = github.com/fsouza/go-dockerclient.streamOptions {setRawTerminal: false, rawJSONStream: false, useJSONDecoder: false,...+8 more}
~r0 = (unreadable empty OP stack)
19 0x0000000000b26e32 in github.com/fsouza/go-dockerclient.(*Client).Logs
at github.com/fsouza/go-dockerclient@v1.6.5/container.go:1548
c = ("*github.com/fsouza/go-dockerclient.Client")(0xc000408bd0)
opts = github.com/fsouza/go-dockerclient.LogsOptions {Context: context.Context(*context.cancelCtx) ..., Container: "efcf31e8a935763808de3f0447749041883806b14f200f2735651b29be9b8537", OutputStream: io.Writer(*o
s.File) ...,...+9 more}
~r0 = (unreadable empty OP stack)
path = (unreadable could not find loclist entry at 0x13c207f for address 0xb26e32)
20 0x0000000000b46fc6 in github.com/hashicorp/nomad/drivers/docker/docklog.(*dockerLogger).Start.func1
at github.com/hashicorp/nomad/drivers/docker/docklog/docker_logger.go:108
d = ("*github.com/hashicorp/nomad/drivers/docker/docklog.dockerLogger")(0xc000126370)
opts = ("*github.com/hashicorp/nomad/drivers/docker/docklog.StartOpts")(0xc00011cc00)
ctx = context.Context(*context.cancelCtx) 0xbeef000000000d08
stdout = io.WriteCloser(*os.File) 0xbeef000000000e08
stderr = io.WriteCloser(*os.File) 0xbeef000000000f08
err = (unreadable could not find loclist entry at 0x13de9b3 for address 0xb46fc6)
sinceTime = (unreadable could not find loclist entry at 0x13de7d2 for address 0xb46fc6)
backoff = 0
logOpts = github.com/fsouza/go-dockerclient.LogsOptions {Context: context.Context(*context.cancelCtx) ..., Container: "efcf31e8a935763808de3f0447749041883806b14f200f2735651b29be9b8537", OutputStream: io.Writer
(*os.File) ...,...+9 more}
21 0x000000000046f801 in runtime.goexit
at runtime/asm_amd64.s:1594
tracing the system calls for the socket
# time strace -e trace=read,write -e read=14 -e write=14 -p 96113 -f -y
strace: Process 96113 attached with 10 threads
^Cstrace: Process 96113 detached
strace: Process 96118 detached
strace: Process 96119 detached
strace: Process 96120 detached
strace: Process 96121 detached
strace: Process 96122 detached
strace: Process 96123 detached
strace: Process 96124 detached
strace: Process 96125 detached
strace: Process 96126 detached
real 3m28.705s
user 0m0.002s
sys 0m0.013s
In the above output we should expect to see reads from the socket for the incoming data on the local socket that is communicating with the docker socket. Something like the below:
[pid 18506] read(14<socket:[157156019]>, "\n\1\0\0\0\0\0\6B{\"date_time\":\"2023-05-3"..., 4092) = 4092
Furthermore the socket is established with no recv or send queues:
Note The socket Inode is the same from the provided lsof output
# ss -nae | grep 1413115581 u_str ESTAB 0 0 /var/run/docker.sock 1413121781 * 1413115581 <-> u_str ESTAB 0 0 * 1413115581 * 1413121781 <->
Proof that logmon is running and working just fine:
echo "I love socket programming\!" > /var/local/nomad/alloc/6f6e6e1a-cec5-4fd9-6c30-357bd01ab471/alloc/logs/.<alloc1>.stdout.fifo
nomad alloc logs -task
I'm going to briefly explain how I perceive this to be working. Based on my findings, but just know that this could be ignorant or just blatantly wrong.
*Code Stepthrough*
docker_logger -> github.com/fsouza/go-dockerclient -> (CONNECT) go/src/net/dial.go | dial method -> (REQUEST) go/src/net/http.go | do method -> (PIPE PUSH) [go/src/io/io.go | Copy method] OR [github.com/docker@v23.0.3+incompatible/stdcopy/stdcopy.go | StdCopy method]
Somewhere in the above flow, something goes awry and some race condition can happen. I am still working through this issue and will hopefully be able to provide more to come, but my best guess right now is some unhandled problem in the dockerclient, or something much deeper in the stack than i'm currently comfortable to give input on. What I do know is that there are no 'read' syscalls coming to the socket and to my knowledge, with the information in the traces, the socket should work like this:
`Epoll_pwait` is basically a waiter that listens to file descriptors and there i/o so it can wake up processes to get the data when its available.
When data is available AKA event, it should notify the process('s), and allow that data to be ingested. Either epoll_pwait is not working correctly to let processes know that data is coming through (less likely) or no data is coming through due to some unknown socket error (more likely).
The nomad client details
[dhabedank@a1v-nom-wdg-pl-0b032036f21efeea9]~% nomad node status -self -verbose
ID = d98b628d-8e56-24c8-42da-1a85a3d88fad
Name = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local
Class = dockerwedge-prod
DC = austin-production
Drain = false
Eligibility = eligible
Status = ready
CSI Controllers =
Drivers
Driver Detected Healthy Message Time
docker true true Healthy 2023-05-14T22:34:00-05:00
exec true true Healthy 2023-03-09T12:24:05-06:00
java false false
Node Events
Time Subsystem Message Details
2023-05-14T22:34:00-05:00 Driver: docker Healthy driver: docker
2023-05-14T22:33:30-05:00 Driver: docker Failed to connect to docker daemon driver: docker
2023-03-09T12:24:07-06:00 Cluster Node re-registered
Allocated Resources CPU Memory Disk 29324/47484 MHz 31 GiB/31 GiB 24 GiB/59 GiB
Allocation Resource Utilization CPU Memory 6164/47484 MHz 5.1 GiB/31 GiB
Host Resource Utilization CPU Memory Disk 17432/47984 MHz 16 GiB/31 GiB 1.7 GiB/60 GiB
Allocations
Attributes consul.connect = false consul.datacenter = aus consul.ft.namespaces = true consul.grpc = -1 consul.revision = 5cfef2ff consul.segment = wedge consul.server = false consul.sku = ent consul.version = 1.10.10+ent cpu.arch = amd64 cpu.frequency = 2999 cpu.modelname = Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz cpu.numcores = 16 cpu.totalcompute = 47984 driver.docker = 1 driver.docker.bridge_ip = 172.17.0.1 driver.docker.os_type = linux driver.docker.privileged.enabled = true driver.docker.runtimes = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc driver.docker.version = 20.10.17 driver.docker.volumes.enabled = true driver.exec = 1 kernel.name = linux kernel.version = 3.10.0-1160.76.1.el7.x86_64 memory.totalbytes = 33547882496 nomad.advertise.address = 10.127.74.96:4646 nomad.revision = 168e70a5b75d21883edb8a89ca879cf6b7825787 nomad.version = 1.2.12+ent os.name = centos os.signals = SIGALRM,SIGBUS,SIGXFSZ,SIGPROF,SIGSEGV,SIGFPE,SIGINT,SIGTERM,SIGABRT,SIGHUP,SIGTSTP,SIGTTOU,SIGUSR1,SIGIO,SIGKILL,SIGTRAP,SIGIOT,SIGSTOP,SIGTTIN,SIGWINCH,SIGCHLD,SIGQUIT,SIGURG,SIGUSR2,SIGCONT,SIGILL,SIGPIPE,SIGSYS,SIGXCPU os.version = 7.9.2009 unique.cgroup.mountpoint = /sys/fs/cgroup/systemd unique.consul.name = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local unique.hostname = a1v-nom-wdg-pl-0b032036f21efeea9.q2dc.local unique.network.ip-address = 10.127.74.96 unique.storage.bytesfree = 64313729024 unique.storage.bytestotal = 64414023680 unique.storage.volume = /dev/sdc vault.accessible = true vault.cluster_id = 7d8266df-31f5-54b7-b3a1-216ab25adb2d vault.cluster_name = vault-cluster-81044b06 vault.version = 1.11.5+ent
Meta connect.gateway_image = envoyproxy/envoy:v${NOMAD_envoy_version} connect.log_level = info connect.proxy_concurrency = 1 connect.sidecar_image = envoyproxy/envoy:v${NOMAD_envoy_version}
Hey All,
It is me back again with some more ted talk. Under further scrutiny, this is confirmed to be some race condition with docker. I was able to proof this out by using curl in a screen session and watching it freeze and hang just as the nomad logging did. Check out the test below:
# curl --silent -XGET --unix-socket /run/docker.sock "http://localhost/v1.41/containers/<CONTAINER_ID>/logs?stdout=true&follow=true"
# strace -ttt -p 63675 -s655535 -f -y
strace: Process 63675 attached
1685943750.487366 restart_syscall(<... resuming interrupted read ...>) = 0
1685943750.533690 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943750.535456 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943751.537527 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943751.539618 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943752.541069 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943752.541903 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943753.547484 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943753.548418 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000) = 0 (Timeout)
1685943754.556300 poll([{fd=3<socket:[2392047265]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
1685943754.557633 poll([{fd=3<socket:[2392047265]>, events=POLLIN}], 1, 1000^Cstrace: Process 63675 detached
Below is the relevant docker version information:
Client: Docker Engine - Community
Version: 20.10.18
API version: 1.41
Go version: go1.18.6
Git commit: b40c2f6
Built: Thu Sep 8 23:14:08 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.12)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:03:33 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
I've also confirmed that this issue is specifically correlated with the docker log rotation. Looking into the docker github history it does look like docker has struggled with this issue for some time, but nothing specific to linux that I could see. I did notice that recently they moved away from using file watchers for log following which may likely have improved or corrected this issue, but I haven't battle tested or reviewed the code changes to give anything insightful at this point. If your following this and looking for a solution, I'd suggest going to the latest docker and testing if you can.
@flipyap Hey first off thanks for digging into this. Its been long enough that I haven't been keeping close tabs on this but we seem to have had another reproduction of this reported by one of our service teams today. In this case we're running a newer version of docker than your latest post:
$ docker version
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:39:02 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:38:05 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.22
GitCommit: 8165feabfdfe38c65b599c4993d227328c231fca
runc:
Version: 1.1.8
GitCommit: v1.1.8-0-g82f18fe
docker-init:
Version: 0.19.0
GitCommit: de40ad0
So seems like the issue persists. Unfortunately I dont have much more context around this at this point than I did when I originally filed this issue.
Nomad version
Found on 0.9.x series of releases. Currently unknown if reproduces on 0.10.x releases- we're currently attempting determine if we see reproduction in our 0.10.x test cluster.
Operating system and Environment details
CentOS Linux release 7.7.1908 (Core)
3.10.0-1062.18.1.el7.x86_64
Issue
What we're seeing is that after running for a Docker task for a while (hours to days), Nomad stops streaming Docker stdout/stderr to their respective log files. This appears to be a Nomad issue as running
docker logs
against the effected containers properly streams logs, so it seems that the log collection is failing somehow. This doesn't appear to be related to log rotation as this has been seen where the$task.std{out,err}.0
log file is still under the size limit for rotation.This was originally reported by one of our teams a few months back, it seemed at the time to only effect one of their jobs and we had an alternative method of shipping their logs so we didn't dig too hard on it. We've now become aware of other jobs being effected by this so we're starting to take a closer look.
We're still in the process of doing investigation but wanted to at least get the issue submitted and we'll provide more information as we learn more.
Reproduction steps
stdout
/stderr
stdout
/stderr
to$task.std{out,err}.X
log filesJob file (if appropriate)
Nomad Client logs (if appropriate)
Will grab these when we see it reproduce again or I'm able to discover more instances currently hidden in the fleet.