Closed Taytay closed 1 year ago
Hello @Taytay! Great catch. I think hints like this
probably need to be updated to aws2
as well as our requirements of aws-cli
:
Do you know if there's any backward incompatible changes from V1 upgrading to V2? If not, I think we should go ahead and do it.
By the way, AWS SSO has been working for us; let us know if that's the case for you.
Thanks for the quick response! I'm relieved to hear that you are using SSO too. That means I'm not swimming upstream too much. I literally just set up AWS SSO today, but so far it's a breath of fresh air compared to the quagmire that was our (very old) way of doing things.
I'm not sure about the incompatible changes from v1, but V2 has been out for years now, so it feels odd to have this "enforce" the older version of the CLI. But I'm new to this ecosystem. (I do realize that you can't pip install
v2.)
The documentation does mention SSO, but says:
Note: If you are using AWS IAM Identity Center (AWS SSO), you will need pip install awscli>=1.27.10. See here for instructions on how to configure AWS SSO."
That AWS documentation tells you to run commands that v1 doesn't have, and that was why I was so confused!
I also ran into another surprising issue trying to launch the hello_sky demo that I think are related to this. 1: My "PowerUser" account didn't have sufficient permissions to create accounts, which it needs in the SSO case:
cloud_vm_ray_backend.py:759] botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetInstanceProfile operation: User: arn:aws:sts::blah-blah is not authorized to perform: iam:GetInstanceProfile on resource: instance profile skypilot-v1 because no identity-based policy allows the iam:GetInstanceProfile action
I was looking for a list of required permissions somewhere, but didn't see it. I'm running from the admin account for now to see if that does what it needs to do on first launch, but I'm not sure if I'll need to run this way every time.
You’re totally right that we should fix that doc. In terms of permissions required, we have an in-progress PR: https://github.com/skypilot-org/skypilot/blob/2dc13d57ac16092bf151fa259c5abc12c4f60951/docs/source/reference/cloud-permissions.rst#aws
Do let us know if this doc helps!
Oh that PR documentation helps a ton! I'll make sure I've got that extra iam permission and try again.
If you wanted to go even bigger, you/we could write the extra permissions check into 'sky check'. That way, I'd know what the cause of the problem was before-hand.
And if the service account isn't created yet, you could have a way to create it from the cli. Then the startup steps would be something like: "run this command as an admin (or someone with create user rights) to 'provision' your aws account", and then have a way to check the permissions to actually "run" sky. I say this because I was initially afraid I'd have to be signed in to a more powerful account to just launch sky instances, but that was only the case for the first one.
Great feedback @Taytay. So far just been swamped with tons of priorities for the "ML/Cloud user", but we do want to start making such "cloud administration"/initial setup tasks easier and smoother. Would appreciate any help from the community :)
Sounds like you're set to start using sky
?
Yes I am! My quota increase was just approved by AWS too.
I got my remote notebook spun up today and it was great.
Now my mind goes to: This seems like could "easily" be almost a drop-in substitute for GitHub CodeSpaces with GPU support...
Thank you and the other contributors for your quick responses and all of your work!
@Taytay We've merged #1888 and now we have https://skypilot.readthedocs.io/en/latest/reference/cloud-administration.html. Let us know if okay to close this issue?
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
Closing this issue now, as it is now solved by the doc, and works with both v1 and v2. Please feel free to file another issue if there is still a confusion. ;)
I'm new to a lot of this, so the error is likely on me, but it seems like SkyPilot is pip installing aws-cli, which is v1 of the CLI. We use AWS SSO, which SkyPilot's code appears to be aware of, but that necessitates us using v2 of the CLI (I think?) in order to get good developer ergonomics, and to follow the comments in the SkyPilot code that tell me to run sso related aws commands. I am so far getting this to work by
pip uninstall aws-cli
after installing skypilot (because I already have v2 of the aws cli installed).Is there a better way I should be handling this?