runpod / runpod-python

🐍 | Python library for RunPod API and serverless worker SDK.
https://pypi.org/project/runpod/
MIT License
180 stars 64 forks source link

Cannot Connect to Pod's Exposed Public IP & Port from Pod within same Region #337

Open cblmemo opened 2 months ago

cblmemo commented 2 months ago

Describe the bug The ports exposed through TCP Public IP cannot be accessed inside pods within same region.

To Reproduce

  1. Use this script to create 2 pods from same region:
    
    import runpod
    import base64
    import os
    from rich import print

def create(name: str, region: str): with open(os.path.expanduser('~/.ssh/id_rsa.pub'), 'r', encoding='utf-8') as f: public_key = f.read().strip() setup_cmd = (

Setting up SSH here

    'prefix_cmd() '
    '{ if [ $(id -u) -ne 0 ]; then echo "sudo"; else echo ""; fi; }; '
    '$(prefix_cmd) apt update;'
    'export DEBIAN_FRONTEND=noninteractive;'
    '$(prefix_cmd) apt install openssh-server rsync curl patch -y;'
    '$(prefix_cmd) mkdir -p /var/run/sshd; '
    '$(prefix_cmd) '
    'sed -i "s/PermitRootLogin prohibit-password/PermitRootLogin yes/" '
    '/etc/ssh/sshd_config; '
    '$(prefix_cmd) sed '
    '"s@session\\s*required\\s*pam_loginuid.so@session optional '
    'pam_loginuid.so@g" -i /etc/pam.d/sshd; '
    'cd /etc/ssh/ && $(prefix_cmd) ssh-keygen -A; '
    '$(prefix_cmd) mkdir -p ~/.ssh; '
    '$(prefix_cmd) chown -R $(whoami) ~/.ssh;'
    '$(prefix_cmd) chmod 700 ~/.ssh; '
    f'$(prefix_cmd) echo "{public_key}" >> ~/.ssh/authorized_keys; '
    '$(prefix_cmd) chmod 644 ~/.ssh/authorized_keys; '
    '$(prefix_cmd) service ssh restart; '
    '[ $(id -u) -eq 0 ] && echo alias sudo="" >> ~/.bashrc;'
    # Starting a test HTTP server
    'python3 -m http.server 9000'
)
encoded = base64.b64encode(setup_cmd.encode('utf-8')).decode('utf-8')
pod = runpod.create_pod(
    name=name,
    image_name="runpod/base:0.0.2",
    gpu_type_id="NVIDIA RTX A4000",
    country_code=region,
    ports="22/tcp,9000/tcp",
    support_public_ip=True,
    docker_args=f'bash -c \'echo {encoded} | base64 --decode > init.sh; bash init.sh\''
)
return pod['id']

rp1_id = create("rp1", "CA") rp2_id = create("rp2", "CA")

print(f"rp1_id = '{rp1_id}'") print(f"rp2_id = '{rp2_id}'")


2. Use this script to get test commands:
```python
def get_cmd(pod_id: str):
    pod_stat = runpod.get_pod(pod_id)
    runtime = pod_stat.get('runtime') or {}
    ports_info = runtime.get('ports', [])
    if not ports_info:
        raise ValueError(f"Pod {pod_id} is not ready.")
    ssh_cmd = None
    curl_cmd = None
    for p in ports_info:
        if p['isIpPublic']:
            if p['privatePort'] == 22:
                ssh_cmd = f'ssh -i ~/.ssh/id_rsa -p {p["publicPort"]} root@{p["ip"]}'
            if p['privatePort'] == 9000:
                curl_cmd = f'curl http://{p["ip"]}:{p["publicPort"]}'
    assert ssh_cmd is not None and curl_cmd is not None, f"Pod {pod_id} is not ready."
    return ssh_cmd, curl_cmd

# Fill in the pod id retrieved from previous script
rp1_id = 'qi5a6pnu01x2zl'
rp2_id = '3k3hy87mtr2old'

rp1_ssh, rp1_curl = get_cmd(rp1_id)
rp2_ssh, rp2_curl = get_cmd(rp2_id)

print(rp1_curl)
print(rp2_curl)

print(f'{rp1_ssh} {rp2_curl}')
print(f'{rp2_ssh} {rp1_curl}')

Example output:

curl http://69.30.85.69:22145
curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22144 root@69.30.85.69 curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22185 root@69.30.85.69 curl http://69.30.85.69:22145
  1. Trying to run the 4 commands we get from the script. The first two (from the laptop running runpod api calls) success but the third and the fourth (which doing curl inside the pod) failed.
$ curl http://69.30.85.69:22145
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ curl http://69.30.85.69:22186
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ ssh -i ~/.ssh/id_rsa -p 22144 root@69.30.85.69 curl http://69.30.85.69:22186
The authenticity of host '[69.30.85.69]:22144 ([69.30.85.69]:22144)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22144' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22186: Connection refused

$ ssh -i ~/.ssh/id_rsa -p 22185 root@69.30.85.69 curl http://69.30.85.69:22145
The authenticity of host '[69.30.85.69]:22185 ([69.30.85.69]:22185)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22185' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22145: Connection refused
  1. The same command works well if the two pod is from different region (tested with CA and SE).

Expected behavior The exposed endpoint is accessible from anywhere, including other pods started by runpod.

Screenshots Pls see the console logs before.

Desktop (please complete the following information):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
$ pip show runpod       
Name: runpod
Version: 1.7.0
Summary: 🐍 | Python library for RunPod API and serverless worker SDK.
Home-page: https://runpod.io
Author: RunPod
Author-email: RunPod <engineer@runpod.io>, Justin Merrell <justin.merrell@runpod.io>
License: MIT License
Location: /home/memory/install/miniconda3/envs/sky/lib/python3.9/site-packages
Requires: aiohttp, aiohttp-retry, backoff, boto3, click, colorama, cryptography, fastapi, inquirerpy, paramiko, prettytable, py-cpuinfo, requests, tomli, tomlkit, tqdm-loggable, urllib3, watchdog
Required-by:

Additional context None

keyboardAnt commented 1 month ago

I also need help with connecting to pods. Connecting via "Basic SSH Terminal" works, but "SSH over exposed TCP" doesn't. I checked the ~/.ssh/authorized_keys file on the pod, and it matches the public key corresponding to the private key I'm using while SSHing. The error I receive is

ssh: connect to host 213.173.108.100 port 12157: Connection refused