roscisz / TensorHive

Tool for managing exclusive GPU access for distributed machine learning workloads
Apache License 2.0
154 stars 25 forks source link

AuthenticationException on test #387

Open saswat0 opened 4 months ago

saswat0 commented 4 months ago

Subject of the issue

I'm trying to setup tensorhive on a system and get the following error upon running tensorhive test (I've masked the hostname by a dummy name)

CRITICAL | 2024-05-03 23:53:08 | MainThread                     | MSG: [✘] hostname.univ.edu FAILED (exit code: None, exception: AuthenticationException) | FROM: tensorhive.core.managers.SSHConnectionManager

Your environment

List relevant info:

Steps to reproduce

I have created a user named tensorhive in the node. I can SSH into it from my laptop and from the node itself without password (by copying the SSH public key). The node is accessible by user@hostname.univ.edu. However, when I run tensorhive test, I get this error.

My hosts_config.ini looks like this

[hostname.univ.edu]
user = user
port = 22
tensorhive init
tensorhive test

Expected behaviour

App should launch without errors

Actual behaviour

INFO     | 2024-05-04 00:11:26 | MainThread                     | MSG: [⚙] Testing SSH connections using key: ~/.config/TensorHive/ssh_key             | FROM: tensorhive.core.managers.SSHConnectionManager
CRITICAL | 2024-05-04 00:11:26 | MainThread                     | MSG: [✘] hostname.univ.edu FAILED (exit code: None, exception: AuthenticationException) | FROM: tensorhive.core.managers.SSHConnectionManager
INFO     | 2024-05-04 00:11:26 | MainThread                     | MSG: Summary: 1/1 failed to connect.                                                 | FROM: tensorhive.core.managers.SSHConnectionManager
INFO     | 2024-05-04 00:11:26 | MainThread                     | MSG: [⚙] Testing SSH connections using default system keys                           | FROM: tensorhive.core.managers.SSHConnectionManager
CRITICAL | 2024-05-04 00:11:26 | MainThread                     | MSG: [✘] hostname.univ.edu FAILED (exit code: None, exception: AuthenticationException) | FROM: tensorhive.core.managers.SSHConnectionManager
INFO     | 2024-05-04 00:11:26 | MainThread                     | MSG: Summary: 1/1 failed to connect.                                                 | FROM: tensorhive.core.managers.SSHConnectionManager
SparklingAperioso commented 2 months ago

Hi, I had the same problem. It seems that when TensorHive is using Paramiko 2.7.2 (https://github.com/paramiko/paramiko) to generate a RSA Key. Unfortunately, it's generating a SHA-1 key which is deprecated by OpenSSH in version 8.8.

In order to have TensorHive connecting to my SSH GPU server, I had to add this options in my sshd_config PubkeyAcceptedAlgorithms = +ssh-rsa.

Actually, because, I only wanted to accept SHA1 key from TensorHive only, I added at the end of my sshd_config


#Allow SHA1 from TensorHive
Match Address @IP_TensorHive_Server/32
     PubkeyAcceptedAlgorithms = +ssh-rsa  ```