pyinfra-dev / pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
https://pyinfra.com
MIT License
3.93k stars 383 forks source link

Unable to connect via SSH through two proxy jumps #1223

Open adrysn opened 1 month ago

adrysn commented 1 month ago

Describe the bug

It seems there's an issue where trying to connect via SSH through two or more proxy jumps fails, resulting in a connection timeout error.

To Reproduce

Let's say we have the following SSH configuration file:

Host jumper1
  Hostname 10.10.10.1
  User devops

Host jumper2
  Hostname 10.20.10.1
  Port 30028
  User devops
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
  ProxyJump jumper1

Host test
  Hostname 10.20.10.2
  User devops
  ProxyJump jumper2

When setting ssh_config_file in inventory.py and trying to connect to the test server (which belongs to the test_group) using the following command, a timeout error occurs.

$ pyinfra inventory.py --limit test_group exec -- hostname
--> Loading config...
--> Loading inventory...
--> Connecting to hosts...
    [test] Could not connect ([Errno 60] Operation timed out)

--> Disconnecting from hosts...
--> pyinfra error: No hosts remaining!

I think the problem might be occurring in the following line where config and kwargs are merged, with sock=None contained in kwargs being overwritten to config. https://github.com/pyinfra-dev/pyinfra/blob/9ce7ac44ef6e9d5d8d8a926836e8d294cccbff29/pyinfra/connectors/sshuserclient/client.py#L150

When printing the config variable before and after the line, it was observed that the config.sock parameter had been changed to None in the second hop.

$ pyinfra inventory.py --limit test_group exec -- hostname
--> Loading config...
--> Loading inventory...
--> Connecting to hosts...
--- debug: New hop
Initial config={'port': 22, 'sock': None, 'username': 'devops'}
Updated config={'port': 22, 'sock': None, 'username': 'devops'}
--- debug: New hop
Initial config={'port': 30028, 'sock': <paramiko.Channel 0 (open) window=2097152 -> <paramiko.Transport at 0x33b6fc0 (cipher ***, *** bits) (active; 1 open channel(s))>>, 'username': 'devops'}
Updated config={'port': '30028', 'sock': None, 'username': 'devops'}
    [test] Could not connect ([Errno 60] Operation timed out)

--> Disconnecting from hosts...
--> pyinfra error: No hosts remaining!

When I changed the line as follows:

config.update({k: v for k, v in kwargs.items() if k not in config})

I confirmed that the connection is established successfully, as shown in the following output. However, there seems to be a secondary issue where an EOF-related error occurs at the end.

$ pyinfra inventory.py --limit test_group exec -- hostname
--> Loading config...
--> Loading inventory...
--> Connecting to hosts...
---debug: New hop
Initial config={'port': 22, 'sock': None, 'username': 'devops'}
Updated config={'port': 22, 'sock': None, 'username': 'devops'}
--- debug: New hop
Initial config={'port': 30028, 'sock': <paramiko.Channel 0 (open) window=2097152 -> <paramiko.Transport at 0x6886d80 (cipher ****, **** bits) (active; 1 open channel(s))>>, 'username': 'devops'}
Updated config={'port': 30028, 'sock': <paramiko.Channel 0 (open) window=2097152 -> <paramiko.Transport at 0x6886d80 (cipher ****, **** bits) (active; 1 open channel(s))>>, 'username': 'devops'}
    No host key for [10.20.10.1]:30028 found in known_hosts
--- debug: New hop
Initial config={'port': 22, 'allow_agent': False, 'look_for_keys': False, 'username': 'devops', 'timeout': 10, 'pkey': PKey(alg=****, bits=****, fp=****:****), 'sock': <paramiko.Channel 0 (open) window=2097152 in-buffer=32 -> <paramiko.Transport at 0x70b2810 (cipher ****, **** bits) (active; 1 open channel(s))>>}
Updated config={'port': 22, 'allow_agent': False, 'look_for_keys': False, 'username': 'devops', 'timeout': 10, 'pkey': PKey(alg=****, bits=****, fp=****:****), 'sock': <paramiko.Channel 0 (open) window=2097152 in-buffer=32 -> <paramiko.Transport at 0x70b2810 (cipher ****, **** bits) (active; 1 open channel(s))>>}
    [test] Connected

--> Preparing operations...
    [test] Ready: shell

--> Beginning operation run...
--> Starting operation: server.shell (hostname)
[test] harbor1
    [test] Success

--> Results:
    Operation                 Hosts   Success   Error   No Change
    server.shell (hostname)   1       1         -       -

--> Disconnecting from hosts...
Exception ignored in atexit callback: <function _join_lingering_threads at 0x105eb3920>
Traceback (most recent call last):
  File "****/paramiko/transport.py", line 149, in _join_lingering_threads
    thr.stop_thread()
  File "****/paramiko/transport.py", line 1920, in stop_thread
    self.packetizer.close()
  File "****/paramiko/packet.py", line 228, in close
    self.__socket.close()
  File "****/paramiko/channel.py", line 669, in close
    self.transport._send_user_message(m)
  File "****/paramiko/transport.py", line 1988, in _send_user_message
    self._send_message(data)
  File "****/paramiko/transport.py", line 1964, in _send_message
    self.packetizer.send_message(data)
  File "****/paramiko/packet.py", line 468, in send_message
    self.write_all(out)
  File "****/paramiko/packet.py", line 382, in write_all
    raise EOFError()
EOFError:

Expected behavior

It should be able to connect through two or more jump proxies.

Meta