Closed aculich closed 6 years ago
Thanks @rdodev. You are helpful as always :-)
Nicely done @yuvipanda @aculich and @choldgraf.
Thanks for the feedback @willingc !
@rdodev I noticed that some of the amazon machine types aren't available in the drop-down list. Specifically I was looking for r4.2large and couldn't find any of the r4 series in there. Is that an intentional heptio decision? Or an AWS thing?
@choldgraf since the main goal of AWS QS is evaluation and testing of K8s, we tried to keep the tested machine types to a reasonable subset of machines that are good for that purpose. Machines not in that list haven't been tested by us; however, you could modify the template, add the type manually and then launch the cluster manually.
ah ok - that makes sense! Along those lines...I just tried creating a cluster of seven r3.large's, and they failed to be created. It looks like 3 of the 7 didn't give a success message to AWS and so it rolled back the whole deployment. Have you guys encountered instability with certain machine types?
pinging you @rdodev in case you're only paying attention to parts of this thread in which you're mentioned ;-)
@choldgraf no, never seen consistent failures w/ any type of instance. Those types of errors are usually on AWS' side.
ok, I'll give it a shot again...
hmmm...I got the same failure to create + rollback. @aculich have you experienced any issues like this on AWS before?
Strange. Are you trying to launch into an existing VPC? What's the exact errors you're seeing?
nope - I'm creating a new one (the button on the left in the guide). It was hard to pin down a specific error message, but it seemed like a subset of the machines being requested didn't succeed (like 3 out of 7) so the whole thing failed and rolled back...
One theory is that this is related to some kind of limit on my AWS account...not sure how to test that out though. This works fine for all the tN machines
@choldgraf A lot of people bump on this issue:
https://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_run_in_Amazon_EC2
hmm - we were requesting r3.large, which isn't listed on that page, so not sure what kind of limits it has. :-/
@choldgraf "All Other Instance Types | 20" this is total per region so if you have any other deployed in a different AZ will count against quota.
Gotcha - yeah we were only requesting 7 so I guess this isn't the issue...hmmm, I can try and ask someone in a different part of the country to deploy w/ heptio and the same computational config
Let me give it a try :D
:-)
Spinning up a cluster with 7 x r3.larges as we speak. Will update when done (or error).
@choldgraf
Region: Oregon (us-west-2)
damnit!
I mean.....that's great! :-)
hmmm, OK I can give it another shot with us-west-2b. This makes me wonder if it is something with my account...
If your account is a child/sub account it's possible other users under the same umbrella account have VMs running in that region and are invisible to you (thus bumping on the quota).
well either way, that's good news - let me send these instructions to another guy we're working with at UW and see if he can get the machines set up...I'm trying to do this so that we can use AWS + JupyterHub for a training camp in early September...so really it just needs to work for him :-)
@choldgraf so it worked, I presume? Please ping me if need be. Though I'm on Eastern time so probably won't check until tomorrow morning.
I still haven't got it working with r3 but it's working with the two machines... I'll let you know if my colleague can get it working. Thanks so much for your help! I'll report back w an update but either way I owe ya a :beer: or two!
hey @rdodev - I wonder if you're still around for a quick question!
First off - the AWS deployments are working quite well, I think...thanks so much for the great guide/template and all the help!
A question: somebody is asking about how to rescale thier AWS cluster after deploying (specifically the "1-20" nodes). I looked through the guide but couldn't find a clear way to do this. Do you have any intuition for how to do this?
ping @arokem since he's interested in this
@choldgraf looking into this. Give me 1/2 hour or so to test solution.
The most graceful way is:
//cc @choldgraf
Thanks! I will give this a try later today. I assume that other parameters can also be changed? For example, instance type, etc.?
@arokem it is possible, but that's a bit more complicated since changing instance type will nuke existing nodes and any data or workloads therein will be lost.
Hey all - as we now have more mature docs for a number of providers, I'm going to close this. If people would like to re-open, please feel free to do so! Though I think it'll be more useful if we have issues for specific cloud providers we haven't supported, rather than one-catch all (especially since this one is quite long already!)
If you're interested in support for this software on AWS, Jetstream, or other cloud providers, please let us know here... or even better, send us a Pull Request with your contributions to getting the code working on your desired cloud provider!
We so far have heard interest in supporting Jetstream using the OpenStack Magnum API, as well as using kubeadm.
We also have heard interest in supporting AWS. Here are some links provided to us by our AWS reps:
https://kubernetes.io/docs/getting-started-guides/aws/ https://aws.amazon.com/quickstart/architecture/heptio-kubernetes/