run-house / runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.
https://run.house
Apache License 2.0
962 stars 37 forks source link

error when start with '--screen' option #203

Closed xiaoFine closed 5 months ago

xiaoFine commented 9 months ago

when run runhouse start --screen, it shows error like

python3 command was not found. Make sure you have python3 installed.

but when running without --screen, it's ok

Versions Please run the following and paste the output below.

wget https://raw.githubusercontent.com/run-house/runhouse/main/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
python collect_env.py 
Python Platform: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-glibc2.17
Python Version: 3.11.4 (main, Jul  5 2023, 13:45:01) [GCC 11.2.0]

Relevant packages: 
boto3==1.33.11
fastapi==0.103.1
fsspec==2023.5.0
pyarrow==13.0.0
rich==13.5.2
runhouse==0.0.13
skypilot==0.4.0
sshfs==2023.10.0
sshtunnel==0.4.0
typer==0.9.0
uvicorn==0.23.2
wheel==0.38.4

Checking credentials to enable clouds for SkyPilot.
  AWS: disabled                              
    Reason: AWS credentials are not set. Run the following commands:
      $ pip install boto3
      $ aws configure
      $ aws configure list  # Ensure that this shows identity is set.
    For more info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
    Details: `aws sts get-caller-identity` failed with error: [botocore.exceptions.NoCredentialsError] Unable to locate credentials.
  Azure: disabled                              
    Reason: ~/.azure/msal_token_cache.json does not exist. Run the following commands:
      $ az login
      $ az account set -s <subscription_id>
    For more info: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli
  GCP: disabled                              
    Reason: GCP tools are not installed. Run the following commands:
      $ pip install google-api-python-client
      $ conda install -c conda-forge google-cloud-sdk -y
    Credentials may also need to be set. Run the following commands:
      $ gcloud init
      $ gcloud auth application-default login
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#google-cloud-platform-gcp
    Details: [builtins.ModuleNotFoundError] No module named 'googleapiclient'
  IBM: disabled                              
    Reason: Missing credential file at /home/admins/.ibm/credentials.yaml.
    Store your API key and Resource Group id in ~/.ibm/credentials.yaml in the following format:
      iam_api_key: <IAM_API_KEY>
      resource_group_id: <RESOURCE_GROUP_ID>
  Kubernetes: disabled                              
    Reason: Credentials not found - check if ~/.kube/config exists.
  Lambda: disabled                              
    Reason: Failed to access Lambda Cloud with credentials. To configure credentials, go to:
      https://cloud.lambdalabs.com/api-keys
    to generate API key and add the line
      api_key = [YOUR API KEY]
    to ~/.lambda_cloud/lambda_keys
  OCI: disabled                              
    Reason: `oci` is not installed. Install it with: pip install oci
    For more details, refer to: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#oracle-cloud-infrastructure-oci
  SCP: disabled                              
    Reason: Failed to access SCP with credentials. To configure credentials, see: https://cloud.samsungsds.com/openapiguide
    Generate API key and add the following line to ~/.scp/scp_credential:
      access_key = [YOUR API ACCESS KEY]
      secret_key = [YOUR API SECRET KEY]
      project_id = [YOUR PROJECT ID]
  Cloudflare (for R2 object store): disabled                              
    Reason: [r2] profile is not set in ~/.cloudflare/r2.credentials. Additionally, Account ID from R2 dashboard is not set. Run the following commands:
      $ pip install boto3
      $ AWS_SHARED_CREDENTIALS_FILE=~/.cloudflare/r2.credentials aws configure --profile r2
      $ mkdir -p ~/.cloudflare
      $ echo <YOUR_ACCOUNT_ID_HERE> > ~/.cloudflare/accountid
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloudflare-r2

SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
If any problems remain, please file an issue at https://github.com/skypilot-org/skypilot/issues/new
Clusters
No existing clusters.

Managed spot jobs
No in progress jobs. (See: sky spot -h)

Additional context fulll logs:

 runhouse start --port 2222
INFO | 2023-12-11 02:29:30.713426 | NumExpr defaulting to 8 threads.
INFO | 2023-12-11 02:29:32.342877 | Using port: 2222.
INFO | 2023-12-11 02:29:32.343102 | Starting API server using the following command: /home/admins/miniconda3/bin/python3 -m runhouse.servers.http.http_server.
Executing `/home/admins/miniconda3/bin/python3 -m runhouse.servers.http.http_server --port 2222`
INFO | 2023-12-11 02:29:34.061997 | NumExpr defaulting to 8 threads.
INFO | 2023-12-11 02:29:36.233910 | Launching HTTP server on port: 2222.
INFO | 2023-12-11 02:29:36.234118 | Launching Runhouse API server with den_auth=False and use_local_telemetry=False on host: 0.0.0.0 and port: 32300
INFO:     Started server process [15764]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:32300 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [15764]
runhouse start --port 2222 --screen
INFO | 2023-12-11 02:29:45.997178 | NumExpr defaulting to 8 threads.
INFO | 2023-12-11 02:29:46.455935 | Using port: 2222.
INFO | 2023-12-11 02:29:46.456143 | Starting API server using the following command: /home/admins/miniconda3/bin/python3 -m runhouse.servers.http.http_server.
Executing `screen -dm bash -c "/home/admins/miniconda3/bin/python3 -m runhouse.servers.http.http_server --port 2222 2>&1 | tee -a '/home/admins/.rh/server.log' 2>&1"`
python3 command was not found. Make sure you have python3 installed.
dongreenberg commented 8 months ago

Oh that's interesting - it seems like your /home/admins/miniconda3/bin/python3 isn't available in bash? Could that be due to needing to activate a conda environment first for it to be available (which may be in the .bashrc but not .bash_profile)?