hashicorp / nomad-autoscaler

Nomad Autoscaler brings autoscaling to your Nomad workloads.
Mozilla Public License 2.0
426 stars 84 forks source link

Autoscaler doesn't reload templates after a signal from nomad. #644

Closed jorgemarey closed 8 months ago

jorgemarey commented 1 year ago

Version: v0.3.7 (90ad44d)

We changed a template in nomad, nomad said that it signaled the autoscaler but in the logs I don't see the autoscaler realoading the configuration.

I sent a sigabrt to see the goroutines:

SIGABRT: abort
PC=0x46d261 m=0 sigcode=0

goroutine 34 [syscall, 206 minutes]:
runtime.notetsleepg(0xffffffffffffffff, 0xc00005a728)
    runtime/lock_futex.go:236 +0x34 fp=0xc00005a7a0 sp=0xc00005a768 pc=0x40bcb4
os/signal.signal_recv()
    runtime/sigqueue.go:169 +0x98 fp=0xc00005a7c0 sp=0xc00005a7a0 pc=0x4679d8
os/signal.loop()
    os/signal/signal_unix.go:24 +0x19 fp=0xc00005a7e0 sp=0xc00005a7c0 pc=0x5486b9
runtime.goexit()
    runtime/asm_amd64.s:1581 +0x1 fp=0xc00005a7e8 sp=0xc00005a7e0 pc=0x46b441
created by os/signal.Notify.func1.1
    os/signal/signal.go:151 +0x2c

goroutine 1 [select, 15504 minutes]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc000471b00)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:324 +0x85
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:339
google.golang.org/grpc.(*csAttempt).recvMsg(0xc00061fe40, {0x1973da0, 0xc0004f8c30}, 0x0)
    google.golang.org/grpc@v1.46.0/stream.go:969 +0xbb
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x16f)
    google.golang.org/grpc@v1.46.0/stream.go:834 +0x25
google.golang.org/grpc.(*clientStream).withRetry(0xc0004717a0, 0xc0005a5460, 0xc0005a5430)
    google.golang.org/grpc@v1.46.0/stream.go:692 +0xd3
google.golang.org/grpc.(*clientStream).RecvMsg(0xc0004717a0, {0x1973da0, 0xc0004f8c30})
    google.golang.org/grpc@v1.46.0/stream.go:833 +0x11f
google.golang.org/grpc.invoke({0x1e4cfb0, 0xc0004b4880}, {0x1c36c73, 0x28}, {0x1994b60, 0xc0004f8c00}, {0x1973da0, 0xc0004f8c30}, 0x0, {0xc00007a100, ...})
    google.golang.org/grpc@v1.46.0/call.go:73 +0xd7
google.golang.org/grpc.(*ClientConn).Invoke(0x7f6ff2b71108, {0x1e4cfb0, 0xc0004b4880}, {0x1c36c73, 0x0}, {0x1994b60, 0xc0004f8c00}, {0x1973da0, 0xc0004f8c30}, {0x0, ...})
    google.golang.org/grpc@v1.46.0/call.go:37 +0x265
github.com/hashicorp/nomad-autoscaler/plugins/base/proto/v1.(*basePluginServiceClient).SetConfig(0xc00004c8f0, {0x1e4cfb0, 0xc0004b4880}, 0xc0005a7670, {0x0, 0x0, 0x0})
    github.com/hashicorp/nomad-autoscaler/plugins/base/proto/v1/base.pb.go:465 +0xce
github.com/hashicorp/nomad-autoscaler/plugins/base.(*PluginClient).SetConfig(0xc00007a3c0, 0xc000271e90)
    github.com/hashicorp/nomad-autoscaler/plugins/base/client.go:48 +0x75
github.com/hashicorp/nomad-autoscaler/plugins/manager.(*PluginManager).Reload(0xc000468f50, 0xc0004f8a80)
    github.com/hashicorp/nomad-autoscaler/plugins/manager/manager.go:139 +0x3f9
github.com/hashicorp/nomad-autoscaler/agent.(*Agent).reload(0xc000468e00)
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:257 +0x3a5
github.com/hashicorp/nomad-autoscaler/agent.(*Agent).handleSignals(0xc000468e00)
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:279 +0xcf
github.com/hashicorp/nomad-autoscaler/agent.(*Agent).Run(0xc000468e00)
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:100 +0x446
github.com/hashicorp/nomad-autoscaler/command.(*AgentCommand).Run(0xc00007cb10, {0xc0000b4020, 0x203000, 0x0})
    github.com/hashicorp/nomad-autoscaler/command/agent.go:333 +0x8e5
github.com/mitchellh/cli.(*CLI).Run(0xc00041d680)
    github.com/mitchellh/cli@v1.1.2/cli.go:262 +0x5f8
main.main()
    github.com/hashicorp/nomad-autoscaler/main.go:26 +0x24d

goroutine 21 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000118780)
    go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0xb9
created by go.opencensus.io/stats/view.init.0
    go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0x92

goroutine 3 [IO wait]:
internal/poll.runtime_pollWait(0x7f6fcbd583d8, 0x72)
    runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc00042c980, 0x4172e6, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00042c980)
    internal/poll/fd_unix.go:402 +0x22c
net.(*netFD).accept(0xc00042c980)
    net/fd_unix.go:173 +0x35
net.(*TCPListener).accept(0xc00000cbe8)
    net/tcpsock_posix.go:140 +0x28
net.(*TCPListener).Accept(0xc00000cbe8)
    net/tcpsock.go:262 +0x3d
net/http.(*Server).Serve(0xc0004a8b60, {0x1e39d40, 0xc00000cbe8})
    net/http/server.go:3002 +0x394
github.com/hashicorp/nomad-autoscaler/agent/http.(*Server).Start(0xc000414f00)
    github.com/hashicorp/nomad-autoscaler/agent/http/server.go:124 +0xd2
created by github.com/hashicorp/nomad-autoscaler/command.(*AgentCommand).Run
    github.com/hashicorp/nomad-autoscaler/command/agent.go:330 +0x896

goroutine 4 [IO wait, 16368 minutes]:
internal/poll.runtime_pollWait(0x7f6fcbd58108, 0x72)
    runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc0004ad6e0, 0xc0004da000, 0x1)
    internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004ad6e0, {0xc0004da000, 0x10000, 0x10000})
    internal/poll/fd_unix.go:167 +0x25a
os.(*File).read(...)
    os/file_posix.go:32
os.(*File).Read(0xc00000e400, {0xc0004da000, 0x10001c0005ccde8, 0x7f6fcbd404d0})
    os/file.go:119 +0x5e
bufio.(*Reader).fill(0xc0005ccf40)
    bufio/bufio.go:101 +0x103
bufio.(*Reader).ReadSlice(0xc0005ccf40, 0x0)
    bufio/bufio.go:360 +0x2f
bufio.(*Reader).ReadLine(0xc0005ccf40)
    bufio/bufio.go:389 +0x27
github.com/hashicorp/go-plugin.(*Client).logStderr(0xc0004af900, {0x1e17260, 0xc00000e400})
    github.com/hashicorp/go-plugin@v1.0.1/client.go:956 +0x2aa
created by github.com/hashicorp/go-plugin.(*Client).Start
    github.com/hashicorp/go-plugin@v1.0.1/client.go:601 +0x14af

goroutine 5 [semacquire, 17589 minutes]:
sync.runtime_Semacquire(0x0)
    runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0x0)
    sync/waitgroup.go:130 +0x71
github.com/hashicorp/go-plugin.(*Client).Start.func2()
    github.com/hashicorp/go-plugin@v1.0.1/client.go:617 +0xc8
created by github.com/hashicorp/go-plugin.(*Client).Start
    github.com/hashicorp/go-plugin@v1.0.1/client.go:604 +0x152f

goroutine 6 [IO wait, 17589 minutes]:
internal/poll.runtime_pollWait(0x7f6fcbd582e8, 0x72)
    runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc0004ad620, 0xc0004d8025, 0x1)
    internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004ad620, {0xc0004d8025, 0xfdb, 0xfdb})
    internal/poll/fd_unix.go:167 +0x25a
os.(*File).read(...)
    os/file_posix.go:32
os.(*File).Read(0xc00000e3f0, {0xc0004d8025, 0xc000095ea8, 0x40592d})
    os/file.go:119 +0x5e
bufio.(*Scanner).Scan(0xc000095f40)
    bufio/scan.go:215 +0x865
github.com/hashicorp/go-plugin.(*Client).Start.func3()
    github.com/hashicorp/go-plugin@v1.0.1/client.go:650 +0x12f
created by github.com/hashicorp/go-plugin.(*Client).Start
    github.com/hashicorp/go-plugin@v1.0.1/client.go:645 +0x15f4

goroutine 25 [chan receive, 17589 minutes]:
github.com/hashicorp/go-plugin.(*Client).Start.func4.1()
    github.com/hashicorp/go-plugin@v1.0.1/client.go:663 +0x7f
created by github.com/hashicorp/go-plugin.(*Client).Start.func4
    github.com/hashicorp/go-plugin@v1.0.1/client.go:661 +0x85

goroutine 26 [select, 17589 minutes]:
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0004b4040)
    google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:112 +0x79
created by google.golang.org/grpc.newCCBalancerWrapper
    google.golang.org/grpc@v1.46.0/balancer_conn_wrappers.go:73 +0x22f

goroutine 8 [select, 17589 minutes]:
google.golang.org/grpc.newClientStreamWithParams.func4()
    google.golang.org/grpc@v1.46.0/stream.go:341 +0x98
created by google.golang.org/grpc.newClientStreamWithParams
    google.golang.org/grpc@v1.46.0/stream.go:340 +0xb65

goroutine 28 [select, 17589 minutes]:
github.com/hashicorp/go-plugin.(*gRPCBrokerClientImpl).Recv(0x0)
    github.com/hashicorp/go-plugin@v1.0.1/grpc_broker.go:231 +0x6d
github.com/hashicorp/go-plugin.(*GRPCBroker).Run(0xc0004140f0)
    github.com/hashicorp/go-plugin@v1.0.1/grpc_broker.go:411 +0x48
created by github.com/hashicorp/go-plugin.newGRPCClient
    github.com/hashicorp/go-plugin@v1.0.1/grpc_client.go:62 +0x287

goroutine 29 [select, 17589 minutes]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc000210360)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:324 +0x85
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:339
google.golang.org/grpc.(*csAttempt).recvMsg(0xc00047a0b0, {0x1aab320, 0xc000348190}, 0x8)
    google.golang.org/grpc@v1.46.0/stream.go:969 +0xbb
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x203000)
    google.golang.org/grpc@v1.46.0/stream.go:834 +0x25
google.golang.org/grpc.(*clientStream).withRetry(0xc000470480, 0xc0000cbe90, 0xc0000cbe60)
    google.golang.org/grpc@v1.46.0/stream.go:688 +0x2f6
google.golang.org/grpc.(*clientStream).RecvMsg(0xc000470480, {0x1aab320, 0xc000348190})
    google.golang.org/grpc@v1.46.0/stream.go:833 +0x11f
github.com/hashicorp/go-plugin/internal/plugin.(*gRPCBrokerStartStreamClient).Recv(0xc000338550)
    github.com/hashicorp/go-plugin@v1.0.1/internal/plugin/grpc_broker.pb.go:149 +0x4c
github.com/hashicorp/go-plugin.(*gRPCBrokerClientImpl).StartStream(0xc0004b41c0)
    github.com/hashicorp/go-plugin@v1.0.1/grpc_broker.go:194 +0x1dd
created by github.com/hashicorp/go-plugin.newGRPCClient
    github.com/hashicorp/go-plugin@v1.0.1/grpc_client.go:63 +0x2c9

goroutine 30 [IO wait, 25 minutes]:
internal/poll.runtime_pollWait(0x7f6fcbd58018, 0x72)
    runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc00042c100, 0xc0003f6000, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00042c100, {0xc0003f6000, 0x8000, 0x8000})
    internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00042c100, {0xc0003f6000, 0x60100000000, 0x8})
    net/fd_posix.go:56 +0x29
net.(*conn).Read(0xc00000e160, {0xc0003f6000, 0x4693ae, 0x439c47})
    net/net.go:183 +0x45
bufio.(*Reader).Read(0xc0004ac4e0, {0xc0004a8040, 0x9, 0xc000192d18})
    bufio/bufio.go:227 +0x1b4
io.ReadAtLeast({0x1e13d00, 0xc0004ac4e0}, {0xc0004a8040, 0x9, 0x9}, 0x9)
    io/io.go:328 +0x9a
io.ReadFull(...)
    io/io.go:347
golang.org/x/net/http2.readFrameHeader({0xc0004a8040, 0x9, 0xc000192d6f}, {0x1e13d00, 0xc0004ac4e0})
    golang.org/x/net@v0.0.0-20220425223048-2871e0cb64e4/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0004a8000)
    golang.org/x/net@v0.0.0-20220425223048-2871e0cb64e4/http2/frame.go:498 +0x95
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc00000a1e0)
    google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:1498 +0x41f
created by google.golang.org/grpc/internal/transport.newHTTP2Client
    google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:365 +0x194f

goroutine 31 [select, 25 minutes]:
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0004143c0, 0x1)
    google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:407 +0x11b
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc0004ac660)
    google.golang.org/grpc@v1.46.0/internal/transport/controlbuf.go:534 +0x85
google.golang.org/grpc/internal/transport.newHTTP2Client.func3()
    google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:415 +0x65
created by google.golang.org/grpc/internal/transport.newHTTP2Client
    google.golang.org/grpc@v1.46.0/internal/transport/http2_client.go:413 +0x1fa5

goroutine 9 [select, 17589 minutes]:
github.com/hashicorp/go-plugin.(*gRPCBrokerClientImpl).StartStream.func1()
    github.com/hashicorp/go-plugin@v1.0.1/grpc_broker.go:181 +0x10e
created by github.com/hashicorp/go-plugin.(*gRPCBrokerClientImpl).StartStream
    github.com/hashicorp/go-plugin@v1.0.1/grpc_broker.go:179 +0x17a

goroutine 32 [chan receive]:
github.com/hashicorp/nomad-autoscaler/plugins/builtin/target/nomad/plugin.(*TargetPlugin).garbageCollectionLoop(0xc0004acba0)
    github.com/hashicorp/nomad-autoscaler/plugins/builtin/target/nomad/plugin/plugin.go:213 +0xc9
created by github.com/hashicorp/nomad-autoscaler/plugins/builtin/target/nomad/plugin.(*TargetPlugin).SetConfig
    github.com/hashicorp/nomad-autoscaler/plugins/builtin/target/nomad/plugin/plugin.go:94 +0xef

goroutine 35 [select, 17589 minutes]:
github.com/armon/go-metrics.(*InmemSignal).run(0xc0004b4a80)
    github.com/armon/go-metrics@v0.3.11/inmem_signal.go:64 +0x6c
created by github.com/armon/go-metrics.NewInmemSignal
    github.com/armon/go-metrics@v0.3.11/inmem_signal.go:38 +0x174

goroutine 37 [sleep]:
time.Sleep(0x3b9aca00)
    runtime/time.go:193 +0x12e
github.com/armon/go-metrics.(*Metrics).collectStats(0xc000138000)
    github.com/armon/go-metrics@v0.3.11/metrics.go:234 +0x25
created by github.com/armon/go-metrics.New
    github.com/armon/go-metrics@v0.3.11/start.go:84 +0x17b

goroutine 38 [select, 15504 minutes]:
github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run(0xc000414b90, {0x1e4cfb0, 0xc0004b47c0}, 0xc0004acd20)
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:72 +0x475
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).Run
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:85 +0x337

goroutine 39 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414c30, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 40 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414c80, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 41 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414cd0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 42 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414d20, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 43 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414d70, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 44 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414dc0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 45 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414e10, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 46 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000414eb0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 47 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415220, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 48 [select, 17589 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be7579, 0xa})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415270, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:128 +0x25f

goroutine 49 [select, 16309 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc0004152c0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 50 [select, 16304 minutes]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc000470d80)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:324 +0x85
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:339
google.golang.org/grpc.(*csAttempt).recvMsg(0xc00047aa50, {0x1a36180, 0xc000356b00}, 0x0)
    google.golang.org/grpc@v1.46.0/stream.go:969 +0xbb
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x27e)
    google.golang.org/grpc@v1.46.0/stream.go:834 +0x25
google.golang.org/grpc.(*clientStream).withRetry(0xc000470b40, 0xc0007076e8, 0xc0007076b8)
    google.golang.org/grpc@v1.46.0/stream.go:692 +0xd3
google.golang.org/grpc.(*clientStream).RecvMsg(0xc000470b40, {0x1a36180, 0xc000356b00})
    google.golang.org/grpc@v1.46.0/stream.go:833 +0x11f
google.golang.org/grpc.invoke({0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x40}, {0x1994da0, 0xc000596750}, {0x1a36180, 0xc000356b00}, 0x0, {0xc00007a100, ...})
    google.golang.org/grpc@v1.46.0/call.go:73 +0xd7
google.golang.org/grpc.(*ClientConn).Invoke(0x7f6ff2b71108, {0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x0}, {0x1994da0, 0xc000596750}, {0x1a36180, 0xc000356b00}, {0x0, ...})
    google.golang.org/grpc@v1.46.0/call.go:37 +0x265
github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1.(*targetPluginServiceClient).Status(0xc00004c900, {0x1e4cfb0, 0xc0004b4880}, 0xc0007078e0, {0x0, 0x0, 0x0})
    github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1/target.pb.go:457 +0xce
github.com/hashicorp/nomad-autoscaler/plugins/target.(*pluginClient).Status(0xc0004b6570, 0xc000475650)
    github.com/hashicorp/nomad-autoscaler/plugins/target/client.go:38 +0x7a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).runTargetStatus(0xc000468f50, {0x7f6fcbbc7518, 0xc0004b6570}, 0xc000118480)
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:347 +0x229
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).handlePolicy(0xc000415310, {0x1e4cfb0, 0xc0004b47c0}, 0xc000090f00)
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:120 +0x3e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415310, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:76 +0x265
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 51 [select, 16349 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415360, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 52 [select, 16344 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc0004153b0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 53 [select, 16339 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415400, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 54 [select, 16334 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415450, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 55 [select, 16329 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc0004154a0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 56 [select, 16324 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc0004154f0, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 57 [select, 16319 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415540, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 58 [select, 16314 minutes]:
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).waitForWork(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:246 +0x1e5
github.com/hashicorp/nomad-autoscaler/policyeval.(*Broker).Dequeue(0xc0004acd80, {0x1e4cfb0, 0xc0004b47c0}, {0x1be2f4a, 0x7})
    github.com/hashicorp/nomad-autoscaler/policyeval/broker.go:176 +0x19a
github.com/hashicorp/nomad-autoscaler/policyeval.(*BaseWorker).Run(0xc000415590, {0x1e4cfb0, 0xc0004b47c0})
    github.com/hashicorp/nomad-autoscaler/policyeval/base_worker.go:60 +0xb6
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).initWorkers
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:134 +0x35b

goroutine 59 [select, 16304 minutes]:
github.com/hashicorp/nomad-autoscaler/agent.(*Agent).runEvalHandler(0xc000468e00, {0x1e4cfb0, 0xc0004b47c0}, 0xc0004acd20)
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:106 +0xb2
created by github.com/hashicorp/nomad-autoscaler/agent.(*Agent).Run
    github.com/hashicorp/nomad-autoscaler/agent/agent.go:97 +0x439

goroutine 60 [select]:
github.com/hashicorp/nomad-autoscaler/policy.(*Manager).periodicMetricsReporter(0xc000414b90, {0x1e4cfb0, 0xc0004b5ac0}, 0xc0004acd20)
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:224 +0xdf
created by github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:61 +0x196

goroutine 61 [select, 15504 minutes]:
github.com/hashicorp/nomad-autoscaler/policy/file.(*Source).MonitorIDs(0xc00042cd80, {0x1e4cfb0, 0xc0004b5ac0}, {0xc0004adce0, 0xc0004adc80})
    github.com/hashicorp/nomad-autoscaler/policy/file/source.go:84 +0x131
created by github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:67 +0x1f3

goroutine 62 [select, 16299 minutes]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc000478c60)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:324 +0x85
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:339
google.golang.org/grpc.(*csAttempt).recvMsg(0xc00061e420, {0x1a36180, 0xc0003573c0}, 0x0)
    google.golang.org/grpc@v1.46.0/stream.go:969 +0xbb
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x27e)
    google.golang.org/grpc@v1.46.0/stream.go:834 +0x25
google.golang.org/grpc.(*clientStream).withRetry(0xc000478240, 0xc0007059d0, 0xc0007059a0)
    google.golang.org/grpc@v1.46.0/stream.go:692 +0xd3
google.golang.org/grpc.(*clientStream).RecvMsg(0xc000478240, {0x1a36180, 0xc0003573c0})
    google.golang.org/grpc@v1.46.0/stream.go:833 +0x11f
google.golang.org/grpc.invoke({0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x40}, {0x1994da0, 0xc0004f5140}, {0x1a36180, 0xc0003573c0}, 0x0, {0xc00007a100, ...})
    google.golang.org/grpc@v1.46.0/call.go:73 +0xd7
google.golang.org/grpc.(*ClientConn).Invoke(0x7f6ff2b715b8, {0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x0}, {0x1994da0, 0xc0004f5140}, {0x1a36180, 0xc0003573c0}, {0x0, ...})
    google.golang.org/grpc@v1.46.0/call.go:37 +0x265
github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1.(*targetPluginServiceClient).Status(0xc00004c900, {0x1e4cfb0, 0xc0004b4880}, 0x1bf6ce9, {0x0, 0x0, 0x0})
    github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1/target.pb.go:457 +0xce
github.com/hashicorp/nomad-autoscaler/plugins/target.(*pluginClient).Status(0xc0004b6570, 0xc000475650)
    github.com/hashicorp/nomad-autoscaler/plugins/target/client.go:38 +0x7a
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).generateEvaluation(0xc0000c2f00, 0xc000118480)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:288 +0xf4
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).handleTick(0xc0000c2f00, {0x1e4cfb0, 0xc0004b47c0}, 0xc000118480)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:215 +0x173
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).Run(0xc0000c2f00, {0x1e4cfb0, 0xc0004b47c0}, 0xc0001907d0)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:156 +0x645
github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run.func1({0xc000040ba0, 0x24})
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:116 +0x46
created by github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:115 +0xed3

goroutine 63 [chan send, 15504 minutes]:
github.com/hashicorp/nomad-autoscaler/policy/file.(*Source).MonitorPolicy(0xc00042cd80, {0x1e4cfb0, 0xc0001fa9c0}, {{0xc000040ba0, 0x24}, 0xc0004294a0, 0xc0004295c0, 0xc000429440})
    github.com/hashicorp/nomad-autoscaler/policy/file/source.go:174 +0x8c5
created by github.com/hashicorp/nomad-autoscaler/policy.(*Handler).Run
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:115 +0x2e5

goroutine 1117455 [IO wait]:
internal/poll.runtime_pollWait(0x7f6fcbd57e38, 0x72)
    runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc000118b80, 0xc00053c000, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000118b80, {0xc00053c000, 0x1000, 0x1000})
    internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000118b80, {0xc00053c000, 0x4d1a26, 0x7f6fcbd57f20})
    net/fd_posix.go:56 +0x29
net.(*conn).Read(0xc0000b1278, {0xc00053c000, 0x0, 0xc000475958})
    net/net.go:183 +0x45
net/http.(*connReader).Read(0xc000475950, {0xc00053c000, 0x1000, 0x1000})
    net/http/server.go:780 +0x16d
bufio.(*Reader).fill(0xc0004c2000)
    bufio/bufio.go:101 +0x103
bufio.(*Reader).Peek(0xc0004c2000, 0x4)
    bufio/bufio.go:139 +0x5d
net/http.(*conn).serve(0xc0006c2000, {0x1e4d058, 0xc000271860})
    net/http/server.go:1955 +0xc36
created by net/http.(*Server).Serve
    net/http/server.go:3034 +0x4e8

I think the problem is that it's blocked here:

goroutine 63 [chan send, 15504 minutes]:
github.com/hashicorp/nomad-autoscaler/policy/file.(*Source).MonitorPolicy(0xc00042cd80, {0x1e4cfb0, 0xc0001fa9c0}, {{0xc000040ba0, 0x24}, 0xc0004294a0, 0xc0004295c0, 0xc000429440})
    github.com/hashicorp/nomad-autoscaler/policy/file/source.go:174 +0x8c5
created by github.com/hashicorp/nomad-autoscaler/policy.(*Handler).Run
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:115 +0x2e5

Due to (I think):

goroutine 62 [select, 16299 minutes]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc000478c60)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:324 +0x85
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    google.golang.org/grpc@v1.46.0/internal/transport/transport.go:339
google.golang.org/grpc.(*csAttempt).recvMsg(0xc00061e420, {0x1a36180, 0xc0003573c0}, 0x0)
    google.golang.org/grpc@v1.46.0/stream.go:969 +0xbb
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x27e)
    google.golang.org/grpc@v1.46.0/stream.go:834 +0x25
google.golang.org/grpc.(*clientStream).withRetry(0xc000478240, 0xc0007059d0, 0xc0007059a0)
    google.golang.org/grpc@v1.46.0/stream.go:692 +0xd3
google.golang.org/grpc.(*clientStream).RecvMsg(0xc000478240, {0x1a36180, 0xc0003573c0})
    google.golang.org/grpc@v1.46.0/stream.go:833 +0x11f
google.golang.org/grpc.invoke({0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x40}, {0x1994da0, 0xc0004f5140}, {0x1a36180, 0xc0003573c0}, 0x0, {0xc00007a100, ...})
    google.golang.org/grpc@v1.46.0/call.go:73 +0xd7
google.golang.org/grpc.(*ClientConn).Invoke(0x7f6ff2b715b8, {0x1e4cfb0, 0xc0004b4880}, {0x1c373f9, 0x0}, {0x1994da0, 0xc0004f5140}, {0x1a36180, 0xc0003573c0}, {0x0, ...})
    google.golang.org/grpc@v1.46.0/call.go:37 +0x265
github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1.(*targetPluginServiceClient).Status(0xc00004c900, {0x1e4cfb0, 0xc0004b4880}, 0x1bf6ce9, {0x0, 0x0, 0x0})
    github.com/hashicorp/nomad-autoscaler/plugins/target/proto/v1/target.pb.go:457 +0xce
github.com/hashicorp/nomad-autoscaler/plugins/target.(*pluginClient).Status(0xc0004b6570, 0xc000475650)
    github.com/hashicorp/nomad-autoscaler/plugins/target/client.go:38 +0x7a
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).generateEvaluation(0xc0000c2f00, 0xc000118480)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:288 +0xf4
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).handleTick(0xc0000c2f00, {0x1e4cfb0, 0xc0004b47c0}, 0xc000118480)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:215 +0x173
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).Run(0xc0000c2f00, {0x1e4cfb0, 0xc0004b47c0}, 0xc0001907d0)
    github.com/hashicorp/nomad-autoscaler/policy/handler.go:156 +0x645
github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run.func1({0xc000040ba0, 0x24})
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:116 +0x46
created by github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run
    github.com/hashicorp/nomad-autoscaler/policy/manager.go:115 +0xed3

I don't know why the grpc transport is blocked, but maybe we could add another context per request to the grpc calls to avoid this issues?

Thanks.

lgfa29 commented 1 year ago

Hi @jorgemarey πŸ‘‹

Thanks for the detailed report. It does seem like something went wrong with your target plugin, as the request for status has been hanging for 16299 minutes. Just out of curiosity, which target plugin is this policy using?

Looking at the code again it may be better to handle ticks in a goroutine to allow the policy reload to be handled concurrently. Though in this case it may not solve the problem if the plugin status fails again.

jorgemarey commented 1 year ago

Hi @lgfa29 ,

The target that I'm using is a plugin that I developed https://github.com/jorgemarey/nomad-nova-autoscaler

Could it makes sense to wrap the timeout used here with another just for the request to avoid problems on the plugin?

I should add a timeout here to avoid this, but I think the autoscaler should be the one that terminates the connection if it takes too long to avoid being blocked.

lgfa29 commented 1 year ago

Ahh I see. A timeout would be helpful, though the question of what "too long" means is never easy to answer πŸ˜…

For target status that's probably easier to define, so maybe it could be a plugin config?

Another thing that I would be curious is if cancelling that doneCTX actually ends the request of if plugins still have to handle it somehow. If they're not checking for a done context it may not have much effect.

jorgemarey commented 1 year ago

Another thing that I would be curious is if cancelling that doneCTX actually ends the request of if plugins still have to handle it somehow. If they're not checking for a done context it may not have much effect.

That method ends up doing the grpc call to the plugin so I guess the grpc connection is the one that will terminate if context is canceled (but I'm not sure), the target plugin doesn't receive any context.

Maybe we need to change the methods here to pass that context to the plugin implementation, although that would break compatibility. Maybe we could add another interface (TargetWithContext) that defines the same methods with the addition of the context and check whether or not the plugin is implementing it.

For target status that's probably easier to define, so maybe it could be a plugin config?

We could do that and use the config that the method receives to enforce the timeout, but as this is the timeout is configured in the nomad-autoscaler code it should be a common configuration as is the dry-run config

I cloud try and make a PR for this. What are your thoughts on the best approximation?

lgfa29 commented 1 year ago

Maybe we could add another interface (TargetWithContext) that defines the same methods with the addition of the context and check whether or not the plugin is implementing it.

I have a vague impression that we may have tried this at some point but the context was not being properly serialized over-the-wire, so external plugins didn't actually receive the context πŸ€”

The possible added complication is that not all API SDKs support cancellable requests (include Nomad 😬), so cancelling the context may not actually stop anything and you end up with multiple scaling actions in parallel.

But anything other that scale seems fine to cancel if it takes too long. We could even use the evaluation_interval as the timeout, or maybe a fraction of it to make sure the next evaluation is triggered right away (say 75% of evaluation_interval), to avoid yet-another-config.

Could you try testing if cancelling the grpc context ublocks the Autoscaler?

jorgemarey commented 1 year ago

Hi @lgfa29. Did the following change on the nomad-autoscaler to test the behaviour (wrap the context with a hardcoded 10 seconds timeout):

// Status is the gRPC client implementation of the Target.Status interface
// function.
func (p *pluginClient) Status(config map[string]string) (*sdk.TargetStatus, error) {
    fmt.Println("BEGIN", time.Now())
    defer func() {
        fmt.Println("END", time.Now())
    }()

    ctx, cancel := context.WithTimeout(p.doneCTX, 10*time.Second)
    defer cancel()

    statusResp, err := p.client.Status(ctx, &proto.StatusRequest{Config: config})
    if err != nil {
        return nil, err
    }

    return &sdk.TargetStatus{
        Ready: statusResp.Ready,
        Count: statusResp.Count,
        Meta:  statusResp.Meta,
    }, nil
}

And a plugin with this behavior (just wait longer than the timeout):

// Status satisfies the Status function on the target.Target interface.
func (t *TargetPlugin) Status(config map[string]string) (*sdk.TargetStatus, error) {
    time.Sleep(120 * time.Second)

    resp := &sdk.TargetStatus{
        Ready: true,
        Count: 1,
        Meta:  make(map[string]string),
    }
    return resp, nil
}

The autoscaler shows this:

BEGIN 2023-07-30 00:58:33.842233311 +0200 CEST m=+60.222911053
END 2023-07-30 00:58:43.84329303 +0200 CEST m=+70.223970767
2023-07-30T00:58:43.843+0200 [WARN]  policy_manager.policy_handler: failed to get target status: policy_id=8cad1831-72d8-c574-4bd7-8124c1cc037b error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

So it seems that canceling the contexts finish the rpc call and unblocks the autoscaler.

I like the idea of providing a fraction of the evaluation_interval as timeout for the Status. The thing is that I don't know how to reach that value in the Status call...

lgfa29 commented 1 year ago

The thing is that I don't know how to reach that value in the Status call...

Oh that's a good point πŸ˜…

The way the plugin interface was designed it attempts to decouple it from the policy itself to avoid plugins implicitly relying on arbitrary values from the policy (like we're trying to do here 😬), which could make documentation and debugging a nightmare.

Passing a context to specific operations, as you first suggested, would probably be the best approach, but that would be a bigger effort to make sure APIs are backwards compatible. I think we would need to define a new interface and do some type assertion whenever we call a plugin method.

For now, a simpler option may be to inject the timeout into the config map and then check for it in the gRPC request. This could be done during policy parsing, perhaps here would be the best place? https://github.com/hashicorp/nomad-autoscaler/blob/37636ebb0428a63cd8490507c46cde24d904d652/policy/policy.go#L32-L49

One possible challenge would be that we don't limit key pattern (we really should though...) so we need to pick something that is unlikely to clash with actual config values, maybe something like _nomad_autoscaler_grpc_timeout?

We could then start validating that config keys starting with _ are invalid (hopefully nobody is using currently declaring any config key like this πŸ˜…)