Closed rdabane closed 5 months ago
Hi there, I'm a bit confused by your output, because it looks like you need a password for that ssh key, but then it looks like it connects anyway? Are you using a .ssh/config file?
Also, aside, you likely need to wrap the end of your script in an if __name__ == "main"
block so that code doesn't run when it's imported on the cluster, like so:
if __name__ == '__main__':
num_cpus()
num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])
Hi, Thanks for a quick reply. I've modified the script by wrapping with if name == 'main' block.
Yes, looks like it asks for password but it connects. Then it seem to open a tunnel after which it tries to bring up the http server in which it doesn't succeed and errors out.
Q. Does the runhouse package need to be installed on the remote system?
Don't know how to debug this? Is it possible to extract the issue into a smaller ssh tunnel command ?
Hi @rdabane, I also faced the similar error while setting up runhouse with my local gpu cluster.
In my case it was due to python
being not being a recognized command and thus python -m runhouse.servers.http.http_server
not executing successfully.
It worked for me on running following command:
sudo apt install python-is-python3
Though it is not intuitive from my error logs:
(ankit)➜ ankit python temp.py
INFO | 2023-08-06 22:51:42,948 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:51:44,075 | Authentication (publickey) successful!
INFO | 2023-08-06 22:51:44,075 | Running command on antbit-ray-cluster: ray start --head
INFO | 2023-08-06 22:51:47,044 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:51:48,171 | Authentication (publickey) successful!
INFO | 2023-08-06 22:51:48,171 | Running command on antbit-ray-cluster: pip freeze
INFO | 2023-08-06 22:51:50,833 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:51:51,857 | Authentication (publickey) successful!
INFO | 2023-08-06 22:51:51,858 | Running command on antbit-ray-cluster: pip install ray==2.4.0
INFO | 2023-08-06 22:51:56,466 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:51:57,489 | Authentication (publickey) successful!
INFO | 2023-08-06 22:51:57,490 | Running command on antbit-ray-cluster: ray start --head
INFO | 2023-08-06 22:52:00,219 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:52:01,483 | Authentication (publickey) successful!
2023-08-06 22:52:01,483| ERROR | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
ERROR | 2023-08-06 22:52:01,483 | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
INFO | 2023-08-06 22:52:02,098 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:52:03,227 | Authentication (publickey) successful!
INFO | 2023-08-06 22:52:03,228 | Checking server antbit-ray-cluster
2023-08-06 22:52:03,889| ERROR | Secsh channel 0 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:03,889 | Secsh channel 0 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:03,893| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:03,893 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:52:03,899 | Server antbit-ray-cluster is up, but the HTTP server may not be up.
INFO | 2023-08-06 22:52:03,899 | Restarting HTTP server on antbit-ray-cluster.
INFO | 2023-08-06 22:52:04,228 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:52:05,477 | Authentication (publickey) successful!
INFO | 2023-08-06 22:52:05,477 | Running command on antbit-ray-cluster: pip install runhouse==0.0.10
INFO | 2023-08-06 22:52:09,238 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:52:10,393 | Authentication (publickey) successful!
INFO | 2023-08-06 22:52:10,393 | Running command on antbit-ray-cluster: pkill -f "python -m runhouse.servers.http.http_server"
INFO | 2023-08-06 22:52:11,417 | Running command on antbit-ray-cluster: screen -dm bash -c "python -m runhouse.servers.http.http_server |& tee -a '~/.rh/cluster_server_antbit-ray-cluster.log' 2>&1"
INFO | 2023-08-06 22:52:17,241 | Checking server antbit-ray-cluster again [1/5].
2023-08-06 22:52:17,426| ERROR | Secsh channel 1 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:17,426 | Secsh channel 1 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:17,429| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:17,429 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:52:22,437 | Checking server antbit-ray-cluster again [2/5].
2023-08-06 22:52:22,665| ERROR | Secsh channel 2 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:22,665 | Secsh channel 2 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:22,669| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:22,669 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:52:27,679 | Checking server antbit-ray-cluster again [3/5].
2023-08-06 22:52:27,903| ERROR | Secsh channel 3 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:27,903 | Secsh channel 3 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:27,906| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:27,906 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:52:32,915 | Checking server antbit-ray-cluster again [4/5].
2023-08-06 22:52:33,080| ERROR | Secsh channel 4 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:33,080 | Secsh channel 4 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:33,083| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:33,083 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:52:38,093 | Checking server antbit-ray-cluster again [5/5].
2023-08-06 22:52:38,404| ERROR | Secsh channel 5 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:52:38,404 | Secsh channel 5 open FAILED: Connection refused: Connect failed
2023-08-06 22:52:38,408| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:52:38,408 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
(antbit)➜ antbit python temp.py
INFO | 2023-08-06 22:54:51,060 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:54:52,033 | Authentication (publickey) successful!
INFO | 2023-08-06 22:54:52,033 | Running command on antbit-ray-cluster: ray start --head
INFO | 2023-08-06 22:54:55,258 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:54:56,282 | Authentication (publickey) successful!
INFO | 2023-08-06 22:54:56,283 | Running command on antbit-ray-cluster: pip freeze
INFO | 2023-08-06 22:54:59,932 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:00,993 | Authentication (publickey) successful!
INFO | 2023-08-06 22:55:00,993 | Running command on antbit-ray-cluster: pip install ray==2.4.0
INFO | 2023-08-06 22:55:04,474 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:05,601 | Authentication (publickey) successful!
INFO | 2023-08-06 22:55:05,602 | Running command on antbit-ray-cluster: ray start --head
INFO | 2023-08-06 22:55:08,638 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:09,803 | Authentication (publickey) successful!
2023-08-06 22:55:09,803| ERROR | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
ERROR | 2023-08-06 22:55:09,803 | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
INFO | 2023-08-06 22:55:10,517 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:11,633 | Authentication (publickey) successful!
INFO | 2023-08-06 22:55:11,634 | Checking server antbit-ray-cluster
2023-08-06 22:55:12,462| ERROR | Secsh channel 0 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:12,462 | Secsh channel 0 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:12,466| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:12,466 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:55:12,472 | Server antbit-ray-cluster is up, but the HTTP server may not be up.
INFO | 2023-08-06 22:55:12,472 | Restarting HTTP server on antbit-ray-cluster.
INFO | 2023-08-06 22:55:12,909 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:13,998 | Authentication (publickey) successful!
INFO | 2023-08-06 22:55:13,999 | Running command on antbit-ray-cluster: pip install runhouse==0.0.10
INFO | 2023-08-06 22:55:17,734 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-06 22:55:20,347 | Authentication (publickey) successful!
INFO | 2023-08-06 22:55:20,347 | Running command on antbit-ray-cluster: pkill -f "python -m runhouse.servers.http.http_server"
INFO | 2023-08-06 22:55:21,474 | Running command on antbit-ray-cluster: screen -dm bash -c "python -m runhouse.servers.http.http_server |& tee -a '~/.rh/cluster_server_antbit-ray-cluster.log' 2>&1"
INFO | 2023-08-06 22:55:27,196 | Checking server antbit-ray-cluster again [1/5].
2023-08-06 22:55:27,372| ERROR | Secsh channel 1 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:27,372 | Secsh channel 1 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:27,376| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:27,376 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:55:32,385 | Checking server antbit-ray-cluster again [2/5].
2023-08-06 22:55:32,620| ERROR | Secsh channel 2 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:32,620 | Secsh channel 2 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:32,623| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:32,623 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:55:37,633 | Checking server antbit-ray-cluster again [3/5].
2023-08-06 22:55:37,805| ERROR | Secsh channel 3 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:37,805 | Secsh channel 3 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:37,809| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:37,809 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:55:42,820 | Checking server antbit-ray-cluster again [4/5].
2023-08-06 22:55:43,052| ERROR | Secsh channel 4 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:43,052 | Secsh channel 4 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:43,056| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:43,056 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-06 22:55:48,062 | Checking server antbit-ray-cluster again [5/5].
2023-08-06 22:55:48,303| ERROR | Secsh channel 5 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-06 22:55:48,303 | Secsh channel 5 open FAILED: Connection refused: Connect failed
2023-08-06 22:55:48,307| ERROR | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-06 22:55:48,307 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
@dongreenberg please let me know if you agree with my hypothesis. I would be happy to raise a PR for imporving the error message for this and thanks for building this awesome piece of software :smile:
Hey @Ankit-Dhankhar , that's a good catch and I agree with your hypothesis. We should definitely amend it to python3 or even find the interpreter path. A PR would be super helpful, are you thinking to switch it to python3?
Hi @dongreenberg , I'm considering implementing a hot fix in the following manner:
import sys
import shutil
possible_interpreters = ['python', 'python3']
for interpreter in possible_interpreters:
executable_path = shutil.which(interpreter)
if executable_path:
# Execute runhouse.servers.http.http_server using the selected Python interpreter
Should neither of the possible interpreters works, an exception will be raised indicating that deployment has failed due to the inaccessibility of the Python interpreter via the python or python3 command. This approach aims to provide users with clearer insight into the cause of the failure.
@Ankit-Dhankhar , Thank you for the tip but it did not work in my case.
Here is what I get:
INFO | 2023-08-07 14:31:26,788 | No auth token provided, so not using RNS API to save and load configs
2023-08-07 14:31:27,469| INF | MainThrea/1060@sshtunnel | 2 keys loaded from agent
INFO | 2023-08-07 14:31:27,469 | 2 keys loaded from agent
2023-08-07 14:31:27,469| INF | MainThrea/1117@sshtunnel | 2 key(s) loaded
INFO | 2023-08-07 14:31:27,469 | 2 key(s) loaded
2023-08-07 14:31:27,470| ERR | MainThrea/1314@sshtunnel | Password is required for key /export/lab/.ssh/mlw01.key
ERROR | 2023-08-07 14:31:27,470 | Password is required for key /export/lab/.ssh/mlw01.key
2023-08-07 14:31:27,470| INF | MainThrea/0978@sshtunnel | Connecting to gateway: 172.17.10.110:22 as user 'lab'
INFO | 2023-08-07 14:31:27,470 | Connecting to gateway: 172.17.10.110:22 as user 'lab'
2023-08-07 14:31:27,470| DEB | MainThrea/0983@sshtunnel | Concurrent connections allowed: True
2023-08-07 14:31:27,470| DEB | MainThrea/1400@sshtunnel | Trying to log in with key: b'a79afb48fad738bfb80ee026219dcdea'
2023-08-07 14:31:27,606| DEB | MainThrea/1204@sshtunnel | Transport socket info: (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 0), timeout=0.1
2023-08-07 14:31:27,634| INF | Thread-1/1893@transport | Connected (version 2.0, client OpenSSH_7.6p1)
INFO | 2023-08-07 14:31:27,634 | Connected (version 2.0, client OpenSSH_7.6p1)
2023-08-07 14:31:28,070| INF | Thread-1/1893@transport | Authentication (publickey) failed.
INFO | 2023-08-07 14:31:28,070 | Authentication (publickey) failed.
2023-08-07 14:31:28,071| DEB | MainThrea/1410@sshtunnel | Authentication error
2023-08-07 14:31:28,071| WAR | MainThrea/1450@sshtunnel | Tunnels are not started. Please .start() first!
WARNING | 2023-08-07 14:31:28,071 | Tunnels are not started. Please .start() first!
2023-08-07 14:31:28,071| INF | MainThrea/1474@sshtunnel | Closing ssh transport
INFO | 2023-08-07 14:31:28,071 | Closing ssh transport
2023-08-07 14:31:28,071| DEB | MainThrea/1477@sshtunnel | Transport is closed
2023-08-07 14:31:28,072| DEB | MainThrea/1400@sshtunnel | Trying to log in with key: b'463095aa1803da78647cd548f37173ef'
2023-08-07 14:31:28,209| DEB | MainThrea/1204@sshtunnel | Transport socket info: (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 0), timeout=0.1
2023-08-07 14:31:28,240| INF | Thread-3/1893@transport | Connected (version 2.0, client OpenSSH_7.6p1)
INFO | 2023-08-07 14:31:28,240 | Connected (version 2.0, client OpenSSH_7.6p1)
2023-08-07 14:31:32,198| INF | Thread-3/1893@transport | Authentication (publickey) successful!
INFO | 2023-08-07 14:31:32,198 | Authentication (publickey) successful!
2023-08-07 14:31:32,200| INF | Srv-50052/1433@sshtunnel | Opening tunnel: 0.0.0.0:50052 <> 127.0.0.1:50052
INFO | 2023-08-07 14:31:32,200 | Opening tunnel: 0.0.0.0:50052 <> 127.0.0.1:50052
INFO | 2023-08-07 14:31:32,200 | Checking server mlw-cluster
2023-08-07 14:31:32,713| ERR | Thread-3/1893@transport | Secsh channel 0 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-07 14:31:32,713 | Secsh channel 0 open FAILED: Connection refused: Connect failed
2023-08-07 14:31:32,713| TRA | Thread-5 /0357@sshtunnel | #1 <-- ('127.0.0.1', 36196) open new channel ssh error: ChannelException(2, 'Connect failed')
2023-08-07 14:31:32,714| ERR | Thread-5 /0394@sshtunnel | Could not establish connection from local ('127.0.0.1', 50052) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-07 14:31:32,714 | Could not establish connection from local ('127.0.0.1', 50052) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
INFO | 2023-08-07 14:31:32,714 | Server mlw-cluster is up, but the HTTP server may not be up.
INFO | 2023-08-07 14:31:32,715 | Restarting HTTP server on mlw-cluster.
INFO | 2023-08-07 14:31:32,715 | Running command on mlw-cluster: pkill -f "python -m runhouse.servers.http.http_server"
Warning: Permanently added '172.17.10.110' (ED25519) to the list of known hosts.
INFO | 2023-08-07 14:31:33,912 | Running command on mlw-cluster: screen -dm bash -c 'python -m runhouse.servers.http.http_server |& tee -a ~/.rh/cluster_server_mlw-cluster.log 2>&1'
INFO | 2023-08-07 14:31:39,627 | Checking server mlw-cluster again.
2023-08-07 14:31:39,706| ERR | Thread-3/1893@transport | Secsh channel 1 open FAILED: Connection refused: Connect failed
ERROR | 2023-08-07 14:31:39,706 | Secsh channel 1 open FAILED: Connection refused: Connect failed
2023-08-07 14:31:39,706| TRA | Thread-14/0357@sshtunnel | #2 <-- ('127.0.0.1', 35038) open new channel ssh error: ChannelException(2, 'Connect failed')
2023-08-07 14:31:39,707| ERR | Thread-14/0394@sshtunnel | Could not establish connection from local ('127.0.0.1', 50052) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-08-07 14:31:39,707 | Could not establish connection from local ('127.0.0.1', 50052) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
Traceback (most recent call last):
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "
During handling of the above exception, another exception occurred:
Hey @Ankit-Dhankhar , that sounds like a solid approach, but I'll note that the python -m
line you're referring to is generated on the user's local box but run remotely, so which
wouldn't be meaningful there. Maybe you can add it inside the runhouse start
command in main.py
, and then change the usage of "python -m runhouse.servers.http.http_server"
in cluster.py to run "runhouse start"
instead?
Describe the bug Hi, I'm trying to use a gpu system on our local network. However I'm running into issues. Basic question: Does the runhouse package need to be installed on the remote gpu system? Couldn't figure this out from the documentation.
Here is the snippet of code I'm trying to run:
I get following error in creating the cluster:
Versions Please run the following and paste the output below.
Additional context Add any other context about the problem here.