Closed tonychenxyz closed 1 month ago
Hi there, likewise I got the error above.
I think it can be traced to a PyYaml issue, https://github.com/yaml/pyyaml/issues/724
Someone related suggestions on stackoverflow, is to update awscli, as it has a pyyaml issue https://stackoverflow.com/questions/76868274/build-failed-with-aws-ebcli-on-python-3-11-4
https://github.com/aws/aws-cli/issues/8036#issuecomment-1638544754
But that doesn't fixed things for me.
Here is my ray config file:
cluster_name: andrew2-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
image: "rayproject/ray:latest"
container_name: "ray_container"
pull_before_run: True
setup_commands:
- sudo apt update
- sudo apt install cmake build-essential
- sudo apt install g++-9
- sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
- wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
- bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
- echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
- pip install --upgrade pip setuptools wheel
- pip install --force-reinstall -v "PyYAML==6.0.1" --no-build-isolation
- pip install awscli --no-build-isolation
- pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
- pip install boto3==1.26.90
- pip install s3fs==2022.11.0
- pip install psutil
- pip install pyarrow
- pip install 'pandas==2.1.4'
- pip install fasttext
- pip install git+https://github.com/mlfoundations/open_lm.git
- git clone https://github.com/mlfoundations/dclm.git
provider:
type: aws
region: us-west-2
cache_stopped_nodes: False
Hi @tonychenxyz , @andrewsiah ,
I looked into this and made some modifications to the yaml file, and have a few variants in which the packages are installed properly. Here is the config that I used - can you try this after making the account specific edits that I marked in the comments?
cluster_name: test-processing
max_workers: 2
upscaling_speed: 1.0
available_node_types:
ray.head.default:
resources: {}
node_config:
ImageId: ami-0c5cce1d70efb41f5
InstanceType: i4i.4xlarge
IamInstanceProfile:
# Replace 000000000000 with your IAM account 12-digit ID
Arn: arn:aws:iam::000000000000:instance-profile/ray-autoscaler-v1
ray.worker.default:
min_workers: 2
max_workers: 2
node_config:
ImageId: ami-0c5cce1d70efb41f5
InstanceType: i4i.4xlarge
IamInstanceProfile:
# Replace 000000000000 with your IAM account 12-digit ID
Arn: arn:aws:iam::000000000000:instance-profile/ray-autoscaler-v1
provider:
type: aws
region: us-west-2
cache_stopped_nodes: False
setup_commands:
- sudo mkfs -t xfs /dev/nvme1n1
- sudo mount /dev/nvme1n1 /tmp
- sudo chown -R $USER /tmp
- sudo chmod -R 777 /tmp
- wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
- bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
- echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
# Include your AWS CREDS here
- echo 'export AWS_ACCESS_KEY_ID=' >> ~/.bashrc
- echo 'export AWS_SECRET_ACCESS_KEY=' >> ~/.bashrc
- pip install --upgrade pip setuptools wheel
- pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
- pip install boto3==1.26.90
- pip install s3fs==2022.11.0
- pip install psutil
- pip install pysimdjson
- pip install pyarrow
- git clone https://github.com/mlfoundations/dclm.git
- pip install -r dclm/requirements.txt
- cd dclm && python3 setup.py install
Hi @tonychenxyz , @andrewsiah just checking in, were you were able to resolve your issue?
Hey, yeap, thanks for the help!
Previously in issue #69 , I was able to ray up with the following config yaml
But now ray up with the same script gives error