Open cruscio opened 4 years ago
:+1: Thanks for the report @cruscio! I'll take a look at this and see if I can reproduce. Looks like your setup instructions are straight forward and I shouldn't have any issues there.
Yep, I see exactly this same behavior. Thanks for such a detailed report!
Is there an update on this issue? We ran into the same problem, but for publish.runner
. It also looks like the timeout settings provided to the publish.*
modules are not propagated to the underlying transport.tcp
client. As a workaround, we modified the publish.*
to include the timeout, i.e.
ret = channel.send(load, timeout=timeout)
instead of
ret = channel.send(load)
with load being a dict containing the key tmo
with the timeout value.
Starlink kills tcp connections after 20 seconds without activity.
Description Using the TCP transport, the tcp_keepalive* settings do not keep the
publish
connection alive when there is a network device with client/server timeouts between the minion and master.Setup
The below code sets up a master and minion, with haproxy between them, in docker containers
Salt Master
Start an Ubuntu:18.04 container. Install, configure and start a salt-master
HAProxy (simulating a network device with a TCP client timeout)
Write a configuration file to proxy salt-master TCP streams. Start an HAProxy container with that config.
Salt Minion
Start an Ubuntu:18.04 container. Install, configure and start a salt-minion pointed at the HAProxy container as its master
Steps to Reproduce the behavior
HAProxy logs either a
cD
or ansD
for the termination flag when closing the connection. Basically, no data was sent for the duration of the timeout.Expected behavior
The documentation ( master / minion ) indicates those settings are for issues in 'messy network environments with misbehaving firewalls', but that doesn't seem to be the case.
One workaround would be to do a
test.ping
at a higher frequency than the network timeout, but this has a not-insignificant CPU cost on Windows minions, where a separate minion process needs to be spun up for every job (it's particularly costly in messy windows antivirus environments)Expected: There should be a keepalive setting that sends some data across the TCP Transport's publish channel in a way that doesn't require the minion to fork/spawn a worker.
Versions Report
Salt Master
Salt Minion
Additional context
Salt Minion Log
HAProxy Log