Closed mrocklin closed 11 months ago
I used binder.pangeo.io to present a tutorial at PyData NYC on 2018-10-22. My apologies for including this late. Requested information:
I would like to use binder.pangeo.io to present a Dask tutorial at PyData DC on 2018-11-16. Requested information:
I would like to use binder.pangeo.io to present a Dask tutorial at PyCon.DE on 2018-10-24. Requested information:
Thanks for doing this Matt!
I would like to use binder.pangeo.io to present a Dask tutorial at ODSC West on 2018-11-01. Requested information:
I'm also happy to set up the infrastructure separately from the pangeo binder. This would mostly just be if you want additional testers of the binder infrastructure to work out issues.
During @xhochy 's tutorial we again ran into resourcing problems. This time the worker node pool did not expand to its full capacity. I'm not sure why. To resolve this I allocated all of the non-preemptible nodes I could (arond 500 cores) for the duration of the tutorial.
There was also some concern when first starting up notebooks and clusters. It's awkward to have the notebook sit idle waiting for workers for a few minutes while VMs start up. Educators might want to ask for many VMs just before class by creating a very large cluster (for example a cluster with 1000 workers). We intentionally make this difficult to do by adding a limit to dask-cluster size. You can override this limit with the following code:
with dask.config.set({'kubernetes.count.max': 1000}):
cluster.scale(1000)
@rabernat, @mrocklin would you appreciate additional beta testers of this setup? If so I'll start to ensure my materials for the tutorial next week work on the binder infrastructure.
The tutorial is actually titled "Cloud Native Data Science with Dask", so I plan to spend about 20 minutes walking through the how the clusters were actually deployed for the attendees.
@TomAugspurger - Please go ahead as planned and use binder.pangeo.io. We're still learning how our binderhub handles these events so if you don't mind being a beta tester, we'd appreciate the pressure. I think @dsludwig and I will be working on the deployment over the next week a bit but we'll keep things in a stable state for the day of your tutorial.
I have an event coming up
@TomAugspurger , how did things go?
Pretty much perfectly.. It was about 40 people, and I the Dask workers came up instantly for people.
I didn't watch to see if / how scaling down went.
For background, just before this workshop happened I logged in and scaled up to 1000 workers, and then back down again. To do this I had to break the built-in worker limit by setting a config parameter
dask.config.set({'kubernetes.count.max': 1000})
cluster.scale(1000)
# wait a bit
cluster.scale(0)
This forces the worker node pool to grow, and then those workers stick around for a bit.
I also had to force things to scale down manually. We still have the problem of fluentd, prometheus, and some other small pod keeping the worker-pool nodes awake.
I would like to use binder.pangeo.io to present a Dask tutorial at Capital One: C4ML on 12/13/18. Requested information:
I would like to use binder.pangeo.io to present a Dask tutorial at PyParis 2018 on 2018-11-15. Requested information:
Thanks for the information @lesteve . I recommend that you follow the procedure in https://github.com/pangeo-data/pangeo/issues/440#issuecomment-435213971 to ensure that there are some VMs provisioned when your students arrive. I'll make sure that things spin down afterwards.
We (@martindurant and myself) would like to use binder.pangeo.io to present a Dask tutorial at PyData DC on Friday (Nov 16th) from 11:00 am - 12:30 pm EST. We plan to use these materials: https://github.com/mrocklin/pydata-nyc-2018-tutorial. I am unsure of the number of attendees, I have not been provided this information.
@jcrist @martindurant Just to give you a ball park estimate. The PyDataDC tutorials are sold out at 150 and there is only 1 other tutorial going on at that time. I would plan for around 75.
Just a quick feed-back after the dask tutorial yesterday at PyParis. Running the dask tutorial through the pangeo binder setup went flawlessly, that was really impressive!
Following https://github.com/pangeo-data/pangeo/issues/440#issuecomment-435213971:
There were around 40 people and they were able to get their 20 workers instantly during the tutorial.
I plan to use binder.pangeo.io to present a Dask tutorial at http://www.irt-saintexupery.com/. Requested information:
@rabernat, @scottyhq and I will be giving a Pangeo tutorial on 12/12 at the 2018 AGU Fall Meeting. (xref: #468).
I'll be giving a short (30-minute) tutorial during the AMS meeting. I don't expect very many (read: really, anyone) people to follow along, but it's still possible, so I want to record things here:
Users should feel free to use as many cores as we like (I think we cap them at something like 50 or 100). At that small size of audience I wouldn't worry too much.
You may want to go through the procedure mentioned above where you allocate a large cluster just before class just to make sure that there are some workers around. If you don't do this then users may have to wait a few minutes before things spin up, but that should be ok too if you inform them that that we're waiting for a few VMs to show up from Google.
On Sat, Jan 5, 2019 at 6:47 PM Daniel Rothenberg notifications@github.com wrote:
I'll be giving a short (30-minute) tutorial during the AMS meeting. I don't expect very many (read: really, anyone) people to follow along, but it's still possible, so I want to record things here:
- Date and time: January 7, 2019 at 3-3:30 PM MDT
- Information about the event: I will give a short live demo of scaling an analysis from a test dataset on my laptop to running on a larger one (~25 GB) on the cloud
- Link to materials: TBD, plan on posting tomorrow
- Number of attendees and resources per attendee: I expect there to be ~25-50 people in the audience, and possibly 10 people who actively try to follow along. I will ask people not to use more than 10 cores.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/440#issuecomment-451710470, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJHQoiucYo5y2CmiZTgsh9-LUEgYks5vAWPSgaJpZM4XzruO .
FYI I will be doing an impromptu demo at Oxford in a few hours using Pangeo binder. Expect some traffic on the cluster, not sure how much.
I would like to use binder.pangeo.io to present a Dask tutorial for the Advanced Scientific Programming in Python, Asia-Pacific Summer School.
Requested information:
Date and time: 2019-01-25 between 09:00 and 13:00 AEDT (i.e. 2019-01-24 between 17:00-21:00 EST).
Information about the event: This will be an introduction to Dask for the Advanced Scientific Programming in Python, Asia-Pacific Summer School.
Link to materials: TBD, will add a link later today once determined.
Number of attendees and resources per attendee: Around 50 attendees. Their exercises are designed to use a handful of cores per user. Memory usage should be pretty low. Just something to give them an idea of how Dask works.
Cool. @jakirkham are you comfortable with the process in https://github.com/pangeo-data/pangeo/issues/440#issuecomment-435213971 ?
Also, feel free to have them use more than just a few cores. We're happy to spend our free compute credits for education and evangelism.
Was just looking at that. SGTM. Thanks. Should we add it to the OP?
Great, thanks @mrocklin. I think the students will find this really cool. :)
Was just looking at that. SGTM. Thanks. Should we add it to the OP?
Good idea. Done!
I'd like to use binder.pangeo.io to present a Dask tutorial for the Observatoire Midi Pyrénées lab .
I'd like to use binder.pangeo.io to present a Dask tutorial on April 3rd from 1:30 to 5:00 PM Central time at AnacondaCon. There will be ~100 attendees. I'm not quite done with materials yet, but these will be some combination of our PyCon tutorial (https://github.com/TomAugspurger/dask-tutorial-pycon-2018) and the PyData NYC tutorial (https://github.com/mrocklin/pydata-nyc-2018-tutorial). I'll update this comment with a link when the materials are finished.
Edit: materials are here: https://github.com/jcrist/anacondacon-2019-tutorial
@jcrist - 100 attendees each with their own KubeCluster
could get pretty big! We have recently become more conscious of our cloud burn rate, which was unsustainably high for a while.
Please go ahead with your tutorial. We want this resource to be used, especially for educational purposes. Just try to be conscientious about scale when you have 100 simultaneous users.
Thanks @rabernat. I've scaled down the cluster size to a max of 10 workers each (default for previous tutorials was 20), with fewer for simpler notebooks to try and combat this. I'm willing to go smaller to help conserve resources, I don't want to strain the cloud resources of such a useful community project.
FWIW I think that these tutorials are valuable to drum up interest and my guess (though not very well informed at the moment) is that they're a small amount relative to general use, particularly because they're one-off rather than continuous. I don't know though.
To be clear, I am 100% 👍 on the tutorial. I agree they have very high value.
It might be useful to use this as an opportunity to figure out how much these tutorials cost. People frequently ask me that, and I don't have an answer beyond "not very much."
Dask workers go into a nodepool with n1-highmem-32
preemptible instances. These have 32 vCPUs and 208 GB of memory. They cost $0.40 per hour. We could make a back-of-the-envelope estimate and then verify from the logs after the tutorial.
btw, @jcrist - you might want to pop in on https://github.com/pangeo-data/pangeo-binder/issues/37 - @guillaumeeb is reporting that users at his tutorial are losing notebook. Could be because we are using preemptible node pools.
This could affect your tutorial tomorrow.
Hi everyone, my tutorial today went pretty well, with public going from governemental agencies (CNES, Ifremer), spatial industry (CLS which is working on altimetry products), or labs (CESBIO for spatial imagery, Legos for Ocean, GET for earth science ...).
I've encountered some small issues:
I think this repositories for tutorial are great, and we should find a way to maintain them. This is probably linked to #575. I'm talking about https://github.com/pangeo-data/pangeo-tutorial-agu-2018 and https://github.com/mrocklin/pydata-nyc-2018-tutorial.
@guillaumeeb would it be worth adding the mrocklin pydata tutorial to the list of tutorials at https://github.com/pangeo-data/awesome-open-climate-science#tutorials or was that mostly eclipsed by the AGU tutorial?
We don't want to bombard people with too many similar tutorials (or worse, just old versions of basically the same tutorial), but if it takes a different or complementary approach, that should be summarized and included in the awesome list, right?
@rsignell-usgs just added it yesterday to https://github.com/pangeo-data/education-material. However, I don't think it is appropriate for a climate science use cases list.
But maybe these two lists are overlapping...
And I fully support the fact we don't want to have too many simular tutorial, but I think we do currently.
It looks like binderhub has a config option that limits the number of users per repository.
binderhub:
config:
per_repo_quota: 100
Larger tutorials might run into this. We might want to change this number (or not) at some point (though preferably not within the next few hours).
@mrocklin - I'd be fine increasing this limit. If you or @jcrist can open an issue on pangeo-binder, we can discuss more there.
I'd like to use the binder deployment for two tutorials on the 1st and 2nd of May (a weekend). Will likely be around 10am GMT on each day for a couple of hours to an audience of 10-20 people. I'm intending on using the material @jcrist prepared for AnacondaCon.
Thanks for the tip to: https://github.com/jcrist/anacondacon-2019-tutorial
I'd like to use binder.pangeo.io to present a tutorial at Northwest Data Science Summit.
@robfatland - sounds good. See @mrocklin's comments earlier in this thread for some tips on helping scale up/down your cluster efficiently.
I'm doing a day-long workshop / training on Pangeo as part of C3DIS (Canberra). I've got a Pangeo / BinderHub deployment set up on AWS but I would like to reserve binder.pangeo.io as a fallback in case my AWS deployment turns out to not scale.
Date and time: 2019-05-09 between 9 am and 5 pm AEST (after @robfatland training!) Information about the event: Pangeo Tutorial at C3DIS, Canberra Link to materials: We plan to use these materials: https://github.com/jmunroe/pangeo-tutorial-c3dis-2019 (rebranded from AGU 2018 tutorials) Number of attendees and resources per attendee: 25 participants x 20 workers each.
As note also on slack: Practice run hitting a snag, here are details:
Ryan's sea level notebook running in binder is hanging fire on the cell below Visually Examine Some Of The Data. The prior Initialize Dataset cell ran fine. The KubeCluster cell runs but gives this error:
/srv/conda/lib/python3.6/site-packages/distributed/bokeh/core.py:57: UserWarning:
Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.
warnings.warn('\n' + msg)
The dask task stream remains empty. Shutting down and trying again produces this error from the same (cluster) cell:
/srv/conda/lib/python3.6/site-packages/dask_kubernetes/config.py:13: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
defaults = yaml.load(f)
As before that "sanity check" cell hangs.
Should I be setting something in the KubeCluster widget? I tried Manual Scaling 7 workers to no avail.
@robfatland, you output traces are just warning and not errors, this should not prevent notebooks to work fine.
First one says your scheduler has started on another port because default one was used (which is to be expected during a tutorial). Have you tried opening the dashboard in another window ?
Second is juste a deprecation warning and should have non impact.
Does the cluster widget shows you allocated cores?
Tried again, anonymous browser. This is working. Initially I get workers: 0 cores: 0 memory: 0. Thinking this was incorrect I set Manual Scaling to 10 but later realized this is an unnecessary step if one is happy with the default 20 workers. Anyway now: The first user gets a pause while things fire up and then everything goes. My second user seems to fire up faster.
And by the way this is just friggin' awesome.
@robfatland I recommend priming the cluster with some VMs. You may want to read the edit in the top post of this issue.
I think having some visualization of what the cluster is doing would go a long way towards alleviating user / instructor anxiety during these tutorials. I'm talking about a basic visual representation of the node / pod information that kubectl can provide. Ideally this could be a tab in Jupyterlab, much like the dask extension.
This is motivated by our experience with the dask dashboard. Users are happy to wait for things if they have feedback about what the computers are doing. But just waiting with no info / progress causes anxiety and confusion.
Hello educator!
We're pleased to hear that you're interested in using Pangeo's cloud deployments for an educational event. To make sure that things run smoothly we ask that you post the following information here before your event:
This helps us both by ensuring that the cluster is sufficiently large during your event (otherwise your students may not get as many cores as they expect) and by providing us information to give back to our funding agencies about how their resources benefit the public.
Edit
For educators wishing to use this cluster, you may want to pre-allocate a bunch of VMs before your students arrive. This will make sure that VMs are around when they try to log on. Otherwise they might have to wait a few minutes while Google gives us machines.
Typically I do this by logging into a Jupyter notebook, and then allocating a fairly large cluster. To do this I need to overwrite the default maximum number of allowed workers.
This forces the worker node pool to grow, and then those workers stick around for a bit. It may take a while for the cloud to give us enough machines. I would do this at least 30m before the tutorial start, and possibly an hour before. You can track progress by watching the IPython widget output of
KubeCluster
, which should update live.You definitely want torelease the pods back to the wild before the tutorial starts, but not too soon before, otherwise the cloud provider will clean up the VMs. Maybe run
scale(0)
a minute before things start off (but in practice you should have 10-20 minutes grace period here).