pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
704 stars 189 forks source link

A pangeo education working group? #575

Closed rabernat closed 5 years ago

rabernat commented 5 years ago

It has come up at many recent weekly checkin meetings that we need to be putting more effort into education and outreach. The tools we are building have been moving very fast; in order to have the biggest impact, we need to find a more coordinated way to bring the broader community along.

Perhaps we need an education working group?

@kmpaul - any thoughts on how this might align with NCAR's ongoing efforts, NCL transition, etc?

Any volunteers to lead such a working group?

rabernat commented 5 years ago

Was discussed in https://github.com/pangeo-data/pangeo/issues/518, where @ktyle and @brian-rose also expressed interest.

rabernat commented 5 years ago

Also discussed extensively in https://github.com/pangeo-data/pangeo/issues/411 (this issue is basically a duplicate of that), where @DamienIrving, @robfatland, @amanda-tan, @marylhaley and @venitahagerty all expressed enthusiasm.

What is needed is a way to focus and coordinate the disparate efforts going on here.

kmpaul commented 5 years ago

This is great! I'm entirely in favor of an education working group.

At NCAR, we have a few irons in the fire, but they definitely need more work. We are developing a larger tutorial (like ~2 days) for later this summer that will be targeted primarily to NCAR/Cheyenne users. And we have small tutorials that we are putting together for smaller outreach efforts to research groups here at NCAR that are (primarily) concerned about what happens in light of the NCL Pivot to Python. These tutorials are looking mostly like short sessions designed to answer simple questions like "How do I do X in Python, when I used to do it easily with NCL?"

All of this has been ad hoc up to this point, but I think it needs to build out. I would very much like to work with people on this. In addition to the people already mentioned, @lheagy had some excellent ideas.

rabernat commented 5 years ago

Thanks for the reply Kevin. It's clear there are lots of good pangeo-related educational materials out there, and I won't try to enumerate them all right now. What I think is needed is someone to take an effort to catalog these for our website, keep them up to date, and identify what new materials need to be generated. A particular gap is related to what I'll call "cloud-native" data analysis. There are lots of guides about how to use xarray, metpy, etc... But there is no guide that explains how to use a pangeo cloud-based jupyterhub, interact with cloud storage, scale up dask clusters, etc.

kmpaul commented 5 years ago

Ah! Yes. That's a very good point. Ana Privette at AWS has expressed an interest in developing a "Pangeo on AWS" tutorial that might be a perfect platform for exactly that.

kmpaul commented 5 years ago

...Should this start by creating an "Educational Materials" section on pangeo.io? Or should this content go into the existing "Guide for Scientists" section?

It would be very cool to binderize some of our existing tutorials.

jmunroe commented 5 years ago

I am currently preparing for delivering a day long set of training (as part of C3DIS, Canberra, May 2019) on Pangeo. The anticipated plan is to use AWS for the deployment. I will try and incorporate some cloud-native material with this issue in mind.

rabernat commented 5 years ago

FWIW, I continuously receive requests for training in "pangeo." If we were able to offer some sort of software-carpentry-style workshop that was just ready to go, even if people had to pay to support the costs, I think there would be lots of interest from institutions.

Currently none of our funding sources has a budget that permits us to offer such training on demand.

amanda-tan commented 5 years ago

I think as part of the Pangeo ACCESS proposal, we had talks about developing SC-style building blocks for on-boarding a more general audience. It might behoove us to work together with the eScience crowd in developing some of these tutorials in conjunction with the hackweeks. +@jhamman

@kmpaul I would be interested in discussing putting in a proposal to the next round of the NSF CyberTraining RFP (due in Jan. 2020) but never too early to start.

kmpaul commented 5 years ago

@amanda-tan Yes! I'd love that! We started thinking about a CyberTraining proposal this year, but the timeframe was too tight. I think that we all agreed we'd (@brian-rose, @ktyle, @lheagy, @jhamman) like to pick that up for the next round.

lheagy commented 5 years ago

Would creating a github repo with just a readme where we can collect a list of pangeo educational material be a useful starting point? I think to @rabernat's comment, first just getting a picture of what material has already been generated would provide some clarity on where to go next.

robfatland commented 5 years ago

Looks great, +1. I'm really interested in three related questions

As a colleague of mine points out: It is not necessarily 'learning pangeo' that is the central challenge. My recent struggles with getting the hang of xarray incline me towards a path from Python to numpy to pandas to <useful ancillary packages> to xarray to dask to <building my own package>; all just in order to be ready to make productive use of pangeo.

I also can't agree enough with learning it 'well' (e.g. the SC ideas) so as to have the right framework for the inevitable developments to come.

I'll mention some notebook repos that would be candidates for @lheagy 's idea.
These are mostly oceanography with a bit of glaciology over in one corner.

dopplershift commented 5 years ago

We at Unidata are happy to help out where we can. We've been teaching our python workshop with regularity. It features a few pangeo components (though not dask yet), but is largely focused on meteorology.

brian-rose commented 5 years ago

+1 for some community organization around educational materials.

I have a little bit of xarray and metpy stuff integrated into my climate modeling lecture notes but would like to dig deeper.

The discussion in #518 led to some great ideas for an NSF CyberTraining proposal. I am definitely game to pick this back up for the January 2020 call.

jwagemann commented 5 years ago

I am just about to start doing some tests with Copernicus open data from ECMWF, but I am happyt to contribute to training material if needed.

kmpaul commented 5 years ago

This is actually a really exciting response from everyone! Thanks, @rabernat, for starting this thread.

There have been so many great replies with excellent material, I went ahead and implemented @lheagy's suggestion and created pangeo-data/education-material as a starting point. I culled the list above and added links and descriptions to the material already provided in this thread.

I think we need to identify the topics that we want for education material and organize this material into those topics, so we can see where we are light and/or heavy on material.

mrocklin commented 5 years ago

Putting on my for-profit hat, trainings are also something that companies frequently request. I would not be surprised if a company like Anaconda (cc @jbednar) or QuanSight (cc @scopatz) would be interested in getting involved.

jbednar commented 5 years ago

We at Anaconda have created the earthml.pyviz.org site, which we've given as a day-long tutorial at NASA Goddard as part of our project with them. It focuses mainly on viz tools and on preparing data for ML tools, and doesn't cover JupyterHub or distributed computation. We'd be happy to prepare and maintain additional publically available training materials as part of our Pangeo or NASA collaborations, but as a company Anaconda is not currently in the business of selling general Python training or domain-specific training. We'd be happy to help advise Quansight or anyone else if they want to train using our materials.

daxsoule commented 5 years ago

+1 for educational materials. I am building my third "generation" of research students at Queens College and each year we do a little better at curating the materials and identifying a pathway that help them go from zero-research. I think this will be very helpful.

robfatland commented 5 years ago

Ok with all of this enthusiasm going allow me to suggest

LejoFlores commented 5 years ago

Hey all! @jhamman pointed me to this issue and I'm psyched to get engaged. This summer my group and I will be working on some AWS-based dask+xarray workflows for analyzing WRF data that I plan on using in my grad research computing class this fall. I'll definitely post links to the class repo. Also, one of my students is taking an online class offered by UMBC that was developed through an NSF CyberTraining grant that is actively using dask and xarray. I don't know if anyone from that class is on here, but it might be good to reach out... he says the class is awesome.

jhamman commented 5 years ago

Hi @LejoFlores, welcome! If you can point us to the NSF CyberTraining course, I think we'd be quite interested to see what it includes.

robfatland commented 5 years ago

Ok I'm slow in setting up a call to chat on the pangeo Ed WG -- busy times -- but i would like to suggest a 30 minute Zoom call next Tuesday at noon PDT / 3pm EDT (during the pangeo gap). I'll put an announcement in slack; i think that's protocol?

amanda-tan commented 5 years ago

@robfatland I would suggest creating an issue here with Date/Time and tagging @pangeo-data/pangeo

If you could also outline an agenda that would be great.

scopatz commented 5 years ago

CC @teoliphant

daxsoule commented 5 years ago

?Thanks Rob! I would like to participate, but I have another meeting at that time.

Best,

Dax

Dax Soule

School of Earth and Environmental Sciences

Queens College

Flushing, NY 11367

(718) 997-3329


From: Anthony Scopatz notifications@github.com Sent: Wednesday, May 8, 2019 5:13 PM To: pangeo-data/pangeo Cc: Dax Soule; Comment Subject: Re: [pangeo-data/pangeo] A pangeo education working group? (#575)

EXTERNAL EMAIL: please report suspicious content to the ITS Help Desk.

CC @teoliphanthttps://github.com/teoliphant

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pangeo-data/pangeo/issues/575#issuecomment-490652947, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHCJOC2ZVJGWUCQ4I2K5GOTPUM67HANCNFSM4G7RRLUQ.

ktyle commented 5 years ago

I would love to participate as well, but will be at a workshop next Mon-Thu and then on vacation until the following Thursday.

rsignell-usgs commented 5 years ago

@LejoFlores, this is the UMBC course you referred to , right? http://cybertraining.umbc.edu/docs/UMBC_CyberTraining_Spring_2019.pdf

robfatland commented 5 years ago

@amanda-tan ok will do; still pending as I am getting over a bad cold

robfatland commented 5 years ago

@pangeo-data/pangeo

Let's do a call this Wed or Thu or Fri; here is a Doodle poll link:

https://doodle.com/poll/a7xd8ciap375wdsi

robfatland commented 5 years ago

@pangeo-data/pangeo My apologies; various factors require me to do-over this pangeo Education Working Group call to first week of June (3/4/5/6). Here is the revised Doodle poll:

https://doodle.com/poll/a7xd8ciap375wdsi

Hoping this will enable more folks to jump in. tx -r

robfatland commented 5 years ago

@jhamman @arokem @aaarendt @amanda-tan @ktyle @dopplershift @scottyhq @rabernat @kmpaul @jmunroe @DamienIrving @lheagy @jfburkhart @brian-rose @jwagemann @mrocklin @jbednar @daxsoule @LejoFlores @pangeo-data/pangeo @teoliphant @mariusvniekerk @scopatz @rsignell-usgs

Pardon the wide net; tried to snag everyone who has chimed in on a pangeo Education working group topic. The first WG call will be tomorrow immediately after the Noon-PDT pangeo call: June 4 2019, 1 pm PDT, via appear.in/pangeo; note this is 4pm EDT.

Details...

phaustin commented 5 years ago

@robfatland -- for the Who's Who: I'm Philip Austin (@phaustin) chair of the Atmospheric Sciences Programme in the UBC Department of Earth, Ocean and Atmospheric Sciences -- we're currently deploying pangeo on a couple of clusters at UBC.

robfatland commented 5 years ago

First pangeo ed working group call went well, thanks to all and I will follow up (this week / next) with a list of actions. If you just want to get rolling on pangeo education then allow me to suggest: Pick a learning objective near and dear, create something learnable, let us know so we can kick the tires.

daxsoule commented 5 years ago

I think the first step here is define an extensive set of learning objectives and curate a list that points to all various repos that might be useful to someone working through the process. If we can help people understand the pathway and the dependencies that one might need to consider to get from novice to data scientist they we will have made a big step forward.

After that, I think that @rabernat's suggestion that we consider a self-publishing a book is something that would be very useful.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rbavery commented 5 years ago

Hi all, before this discussion gets too stale, I just saw Ryan's State of Pangeo slides and noted that there is a lot of interest in improving documentation and education of the Pangeo stack, and partnership with the Carpentries was specifically mentioned.

I recommend The Carpentries' lesson template as a model of how to maintain educational material, for lessons that are deemed to be worth building off of and maintaining. Having lessons that are outlined with challenges to reinforce the lessons and can run on a laptop with some setup is valuable for getting folks like me up to speed with using tools like xarray in my day to day work.

I think Binder notebooks are a great tool for getting your feet wet with an example. But as a beginner who is familiar with xarray and dask and is looking to now use it on my own projects, I only get value from repeating someone else's analysis by executing code cells if the example happens to line up with what I'm trying to do at the moment (and if the Binder notebook is still functional). I think there's a need for grand tour of xarray (and dask?) for the geosciences in the format of a Carpentries workshop.

We have started the discussion at this Carpentries repo on drafting a Geospatial Python lesson, which I and others would like to see cover xarray and possibly other tools in the pangeo stack. We're tracking this discussion here: https://github.com/carpentries-incubator/geospatial-python/issues/1 The focus of this workshop would be geared toward folks with python experience, an understanding of geospatial fundamentals, but no geospatial python experience.

Are any folks in the Pangeo community interested in contributing to a Geospatial Python Carpentries lesson? I myself intend to contribute example lessons on reading and plotting time series of geotiffs using Data Carpentry's Geospatial R lesson as a template, but using xarray DataArrays and Datasets.

amanda-tan commented 5 years ago

@rbavery Do you know of Geohackweek? https://github.com/geohackweek/tutorial_contents has many lessons that would be useful for those looking to get started. Damien Irving's Python in AOS is also extremely useful. That said, an official Data Carpentry lesson would be a great place to point people to as well.

rabernat commented 5 years ago

This issue was moved to https://discourse.pangeo.io/t/a-pangeo-education-working-group/43