pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
699 stars 189 forks source link

Jetstream Deployment #72

Closed rabernat closed 4 years ago

rabernat commented 6 years ago

It would be great to deploy on Jetstream via openstack. This would allow us to evaluate the platform on an NSF-sponsored resource.

@jmunroe: it sounds like you might be able to help with this. In order to get started, please go to https://portal.xsede.org and create an account (if you don't already have one). I will then grant you access to our allocation.

rabernat commented 6 years ago

One thing to be aware of is that the allocation expires at the end of the month. So there is some level of urgency.

yuvipanda commented 6 years ago

https://zonca.github.io/2017/12/scalable-jupyterhub-kubernetes-jetstream.html is a quick tutorial on this. I'm happy to help debug and answer questions if they come up :) How big is your allocation, @rabernat?

yuvipanda commented 6 years ago

I would recommend using NFS rather than rook as mentioned in the article for just now, mostly because I've far more experience with NFS (even though I also am not a fan of it) than rook. I believe that is also generally true of everyone.

yuvipanda commented 6 years ago

/cc @aculich from Berkeley who has been my entry point to all things JetStream :)

jmunroe commented 6 years ago

@rabernat Thanks. I've created an account on the portal: jmunroe

rabernat commented 6 years ago

Great! Should now have full access to the Jetstream allocation. Anything you can make happen with this would be deeply appreciated!

rsignell-usgs commented 6 years ago

@rabernat, ask @jreadey for a username/password for his HSDS endpoint on XSEDE. It will provide you the ability to store your NetCDF files as chunks on S3 with many TB of available space with scalable access. You will burn some of your JetStream acccount just converting to HSDS! :smile_cat:

rsignell-usgs commented 6 years ago

@rabernat, when John implemented HSDS on jetstream, he was the first user of S3, and they didn't know how to establish limits. So apparently he has unlimited access to S3 on jetstream (but of course there must be some limit -- I just don't know what it is).

HSDS is a service, so when you write from the machine under your account, it's actually getting stored on S3 which is under John's account. So whatever storage you have allocated on Jetstream is unaffected when you save your data to HSDS.

rabernat commented 6 years ago

@jmunroe is leading the charge on the jetstream deployment. Hopefully he will get in touch with @jreadey to give the HSDS approach a try.

jreadey commented 6 years ago

I was using the S3 API for the JetStream Ceph storage, and yes it doesn't seem like it is very heavily used.

I'm running HSDS on a single host (as a bunch of Docker containers), but would be very interested in deploying to Kubernetes on multiple hosts as long as I didn't need to setup Kubernetes myself.

@jmunroe - would that be possible to try out HSDS on the Kubernetes cluster you are setting up?

zonca commented 6 years ago

I am working now with @rsignell-usgs to test pangeo on Jetstream, I started from https://zonca.github.io/2017/12/scalable-jupyterhub-kubernetes-jetstream.html and then deployed dask with Helm and I can connect to the scheduler and launch jobs on the workers. I'm now starting to play with daskernetes, I'll let you know how it goes.

mrocklin commented 6 years ago

Very cool. I'm glad to see this happen. I'm guessing/hoping that you're running from http://dask.pydata.org/en/latest/setup/kubernetes-helm.html

Note that the Dask helm chart is more of a standalone thing. This works great for individual users (and maybe this is what you're targetting).

If you want to get JupyterHub running you'll likely want a separate helm chart. Current instructions for GCE are here: https://github.com/pangeo-data/pangeo/tree/master/gce My hope is that they would need only a small modification to run elsewhere.

rsignell-usgs commented 6 years ago

@zonca has also made progress with Zarr on Jetstream: https://zonca.github.io/2018/03/zarr-on-jetstream.html

Ping @julienchastang

rsignell-usgs commented 6 years ago

@zonca has deployed Pangeo on Jetstream: https://zonca.github.io/2018/06/private-dask-kubernetes-jetstream.html He next will be working on persistent storage, and looking into cluster autoscaling for OpenStack.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

rsignell-usgs commented 5 years ago

I was talking with @zonca about what the hold ups are for getting a fully functional Pangeo instance going on Jetstream,and he said basically the big problem is lack of cluster autoscaling.

Although there are kubernetes cluster autoscalers for many cloud providers (and we use the AWS one on pangeo.esipfed.org), there is only a partially completed PR for OpenStack.

I communicated with @dankohn from the Cloud Native Computing Foundation and he pointed me toward the Kubernetes Cluster API work, which seems a more forward looking approach. And there is an implementation for OpenStack which we will be exploring in the coming months.

Encouraging!

julienchastang commented 5 years ago

Is there anyway to get this issue re-opened? I believe this is still a desired goal albeit a long-term one.

rsignell-usgs commented 5 years ago

@zonca do you have cluster autoscaling working on jetstream?

zonca commented 5 years ago

@rsignell-usgs no, I don't, I am now trying again with magnum, because they released a Openstack plugin for "Cluster autoscaler" based on Magnum, see https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/15

dankohn commented 5 years ago

@zonca @rsignell-usgs Can I just confirm that you're aiming to deploy on top of an existing OpenStack cloud? If so, a Magnum-based option is your best bet, and I could connect you with a Magnum expert if you're running into issues.

If you're working with bare metal resources, you may be better off using KubeSpray to deploy Kubernetes directly on to the bare metal.

I presume you've also seen https://github.com/jupyterhub/zero-to-jupyterhub-k8s.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

zonca commented 5 years ago

ok, this now works, see my tutorial at https://zonca.github.io/2019/06/kubernetes-jupyterhub-jetstream-magnum.html

Next I'll investigate if I can configure cluster autoscaler to automatically create/destroy Jetstream instances based on load.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

zonca commented 4 years ago

Got also the autoscaler working: https://zonca.github.io/2019/09/kubernetes-jetstream-autoscaler.html

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.