Open outdoteth opened 3 years ago
Fixed by adding IAMReadOnlyAccess
policy to ray-autoscaler-v1
role.
I am still seeing this issue in ray 1.8.0 when I run ray exec cluster.yaml --start --stop 'echo "hello world"'
.
Failed to fetch IAM instance profile data for ray-autoscaler-v1 from AWS.
Error code: AccessDenied
!!! Boto3 error:
An error occurred (AccessDenied) when calling the GetInstanceProfile operation: User: arn:aws:sts::<number>:assumed-role/ray-autoscaler-v1/i-<number> is not authorized to perform: iam:GetInstanceProfile on resource: instance profile ray-autoscaler-v1
!!!
The error seems to occur when ray tries to tear down the cluster. Indeed, if I remove the --stop
then there's no error. Shutting down manually with ray down cluster.yaml
works fine.
It seems possible permissions for the default autoscaler role are not correctly configured. Reopening this to remind us to look into this.
Curious if any progress has been made on this. If it helps, ray's own example-full.yaml works to reproduce.
Still running into this in ray 2.7.
Recent enough to garner looking into; @jjyao let's look into it as part of next week's regular weekly core GH triage.
Hi @vladfi1, I tried ray up python/ray/autoscaler/aws/example-full.yaml
with latest Ray and it worked for me. Could you try to run that command using latest Ray and see what errors you get?
@jjyao The issue for me is with --stop
, for example
ray exec example-full.yaml --start --stop 'echo "hello world"'
@vladfi1 I see. We will look into this but currently ray exec
is low priority for us so it might take a while to fix it. In the meantime, it's recommended to not use ray exec
but ray job submission or ssh to the node directly.
When I run the following command:
I get this error:
I have put my credentials in ~/.aws/credentials file and they are completely valid so I'm not sure why I'm getting this error. My credentials file looks like this (but with valid credentials):