jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.55k stars 796 forks source link

Add support for additional cloud providers #88

Closed aculich closed 6 years ago

aculich commented 7 years ago

If you're interested in support for this software on AWS, Jetstream, or other cloud providers, please let us know here... or even better, send us a Pull Request with your contributions to getting the code working on your desired cloud provider!

We so far have heard interest in supporting Jetstream using the OpenStack Magnum API, as well as using kubeadm.

We also have heard interest in supporting AWS. Here are some links provided to us by our AWS reps:

https://kubernetes.io/docs/getting-started-guides/aws/ https://aws.amazon.com/quickstart/architecture/heptio-kubernetes/

willingc commented 7 years ago

Thanks @rdodev. You are helpful as always :-)

Nicely done @yuvipanda @aculich and @choldgraf.

choldgraf commented 7 years ago

Thanks for the feedback @willingc !

@rdodev I noticed that some of the amazon machine types aren't available in the drop-down list. Specifically I was looking for r4.2large and couldn't find any of the r4 series in there. Is that an intentional heptio decision? Or an AWS thing?

rdodev commented 7 years ago

@choldgraf since the main goal of AWS QS is evaluation and testing of K8s, we tried to keep the tested machine types to a reasonable subset of machines that are good for that purpose. Machines not in that list haven't been tested by us; however, you could modify the template, add the type manually and then launch the cluster manually.

choldgraf commented 7 years ago

ah ok - that makes sense! Along those lines...I just tried creating a cluster of seven r3.large's, and they failed to be created. It looks like 3 of the 7 didn't give a success message to AWS and so it rolled back the whole deployment. Have you guys encountered instability with certain machine types?

choldgraf commented 7 years ago

pinging you @rdodev in case you're only paying attention to parts of this thread in which you're mentioned ;-)

rdodev commented 7 years ago

@choldgraf no, never seen consistent failures w/ any type of instance. Those types of errors are usually on AWS' side.

choldgraf commented 7 years ago

ok, I'll give it a shot again...

choldgraf commented 7 years ago

hmmm...I got the same failure to create + rollback. @aculich have you experienced any issues like this on AWS before?

rdodev commented 7 years ago

Strange. Are you trying to launch into an existing VPC? What's the exact errors you're seeing?

choldgraf commented 7 years ago

nope - I'm creating a new one (the button on the left in the guide). It was hard to pin down a specific error message, but it seemed like a subset of the machines being requested didn't succeed (like 3 out of 7) so the whole thing failed and rolled back...

One theory is that this is related to some kind of limit on my AWS account...not sure how to test that out though. This works fine for all the tN machines

rdodev commented 7 years ago

@choldgraf A lot of people bump on this issue:

https://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_run_in_Amazon_EC2

choldgraf commented 7 years ago

hmm - we were requesting r3.large, which isn't listed on that page, so not sure what kind of limits it has. :-/

rdodev commented 7 years ago

@choldgraf "All Other Instance Types | 20" this is total per region so if you have any other deployed in a different AZ will count against quota.

choldgraf commented 7 years ago

Gotcha - yeah we were only requesting 7 so I guess this isn't the issue...hmmm, I can try and ask someone in a different part of the country to deploy w/ heptio and the same computational config

rdodev commented 7 years ago

Let me give it a try :D

choldgraf commented 7 years ago

:-)

rdodev commented 7 years ago

Spinning up a cluster with 7 x r3.larges as we speak. Will update when done (or error).

rdodev commented 7 years ago

@choldgraf awsparams awsresult

Region: Oregon (us-west-2)

choldgraf commented 7 years ago

damnit!

choldgraf commented 7 years ago

I mean.....that's great! :-)

hmmm, OK I can give it another shot with us-west-2b. This makes me wonder if it is something with my account...

rdodev commented 7 years ago

If your account is a child/sub account it's possible other users under the same umbrella account have VMs running in that region and are invisible to you (thus bumping on the quota).

choldgraf commented 7 years ago

well either way, that's good news - let me send these instructions to another guy we're working with at UW and see if he can get the machines set up...I'm trying to do this so that we can use AWS + JupyterHub for a training camp in early September...so really it just needs to work for him :-)

rdodev commented 7 years ago

@choldgraf so it worked, I presume? Please ping me if need be. Though I'm on Eastern time so probably won't check until tomorrow morning.

choldgraf commented 7 years ago

I still haven't got it working with r3 but it's working with the two machines... I'll let you know if my colleague can get it working. Thanks so much for your help! I'll report back w an update but either way I owe ya a :beer: or two!

choldgraf commented 7 years ago

hey @rdodev - I wonder if you're still around for a quick question!

First off - the AWS deployments are working quite well, I think...thanks so much for the great guide/template and all the help!

A question: somebody is asking about how to rescale thier AWS cluster after deploying (specifically the "1-20" nodes). I looked through the guide but couldn't find a clear way to do this. Do you have any intuition for how to do this?

choldgraf commented 7 years ago

ping @arokem since he's interested in this

rdodev commented 7 years ago

@choldgraf looking into this. Give me 1/2 hour or so to test solution.

rdodev commented 7 years ago

The most graceful way is:

  1. log into aws console and go to CloudFormation
  2. Find the stack that you want to scale out (name ends in 12 uppercase alphanumeric string, both stacks share the same prefix name)
  3. Select above mentioned stack, then from Actions menu, click on Update Stack
  4. Click Next
  5. In parameters, change value of Node Capacity to desired value.
  6. Click Next twice
  7. Confirm change and click on Update.

//cc @choldgraf

arokem commented 7 years ago

Thanks! I will give this a try later today. I assume that other parameters can also be changed? For example, instance type, etc.?

rdodev commented 7 years ago

@arokem it is possible, but that's a bit more complicated since changing instance type will nuke existing nodes and any data or workloads therein will be lost.

choldgraf commented 6 years ago

Hey all - as we now have more mature docs for a number of providers, I'm going to close this. If people would like to re-open, please feel free to do so! Though I think it'll be more useful if we have issues for specific cloud providers we haven't supported, rather than one-catch all (especially since this one is quite long already!)