swcarpentry / hpc-novice

Novice introduction to high performance computing
Other
36 stars 29 forks source link

SC16 Workshop Call #4

Closed jduckles closed 8 years ago

jduckles commented 8 years ago

Moving Discussion from this [Discuss] thread to GitHub

Here is the call from Paul Wilson: sc16-tutorials-call.pdf

It seems there is some momentum to put a tutorial together and do some lesson development sprinting over the summer. From the [Discuss] thread it looks like Ashwin Trikuta, Dana Brunson, and Kate Hertweck seem to have thought about HPC and done some work in the context SWC pedagogy and workshop methods. There are lots of others with material for particular systems, so the first thing to decide is probably what should be included and what should be left out within the length of time allotted at SC tutorials.

If we're going to pull off an hpc-carpnetry workshop for SC16, I suggest we use this thread to form the team, then start opening issues in this repository and get hacking.

justbennet commented 8 years ago

I took the liberty of dicing the index.md into the three broad topics, as a basis for continuing more focused discussion on each of them separately, and issuing a pull request.

hneeman commented 8 years ago

Responding to Dana's point, I agree with her completely:

What we've seen, over and over, at site after site, is that HPC usage is overwhelmingly done using community and/or commercial software. The users of these software packages typically know little or (more commonly) nothing about programming, and most of them, when they start using HPC, know little or (more commonly) nothing about command lines, batch computing, etc. A much smaller subset of users runs their own homebrew codes, and a vanishingly small subset contribute to community code development (at OU, out of over a thousand HPC users, perhaps 20 do that).

I was at a talk by the director of one of the biggest national HPC centers last semester, where he pointed out that half their usage is roughly 40 codes, and the other half is roughly 4000 codes. This gives a good sense of the scale of the issue.

In any case, having participants who know little or nothing about command lines and Linux, but need to learn both very quickly, is right up Software Carpentry's alley.

As for batch computing, a key issue is that (a) every batch system is different and (b) every installation of every batch system is different -- this stuff is highly idiosyncratic. So it's crucial to stress, at the event, that the exercises being provided are just one particular system's way of doing things, and that the attendees should work with their local HPC staff to get up and running on their local resource (or with national staff on national resources, etc).

I encourage y'all to take a look at examples of how institutions are addressing this issue. Here's mine:

http://www.oscer.ou.edu/Workshops/Overview/sipe_exercise_01_learningbatch_boomer_20130123.pdf http://www.oscer.ou.edu/Workshops/Overview/sipe_exercise_01_learningbatch_boomer_20130123.docx

As others have recommended in this thread, their first exposure to batch computing is a toy code that isn't parallelized, so that the only new knowledge they focus on is the combination of command line and batch.

Note that we also provide a package that other institutions can tune to their local conditions:

http://www.oscer.ou.edu/Workshops/Overview/SIPE2013_exercises.zip

This is because there's minimal hope that a version tuned for our institution will work at their institution -- even if they run the exact same batch system as we do (which they probably don't -- there are several of them in common use in academia).

But, as mentioned by others in this thread, shortly after this we follow with running parallel codes (usually MPI), because many (perhaps most) HPC users will run parallel codes at least part of the time (otherwise in many if not most cases they wouldn't go to the trouble of using HPC, which, let's face it, is a pain in the behind). See attached. rmacc2015_exercise_mpihelloworld_boomer_20150811.docx rmacc2015_exercise_mpigreetings_boomer_20140811.docx

These parallel codes are toys too, again just to illustrate how MPI codes behave.

hneeman commented 8 years ago

I should add one more thing:

Performance generally comes down to two issues: the storage hierarchy, and parallelism.

If an HPC discussion focuses only on parallelism but not on the storage hierarchy, my experience has been that it can be extremely difficult for attendees to grasp why parallel performance typically is substantially worse than linear speedup.

I would strongly advocate teaching the storage hierarchy. In my experience, this can be done very quickly. See:

http://www.oscer.ou.edu/Workshops/Overview/sipe_overview_20150120.pptx

especially slides #37-43.

That's 7 slides to cover everything they need to know about the storage hierarchy at that stage in their learning about HPC.

dbrunson commented 8 years ago

FYI, y'all: Henry (@hneeman) is not only my partner in crime here in Oklahoma and with XSEDE Campus Engagement, he is the creator of the "Supercomputing in Plain English" Series. You can see the intro talk here: https://www.youtube.com/watch?v=rB4UI_WODL0 The second and following talk goes into great stuff for people wanting to parallel programming. I don't think we need to re-create this particular wheel that Henry already does so well. You can see the full spectrum of topics here: http://www.oscer.ou.edu/education.php I think the HPC Carpentry target audience is those totally new to command line/remote computing/etc who have not written code and most likely never will. If they want to write code there are are many great materials, including SIPE, already available.

hneeman commented 8 years ago

I agree with Dana (@dbrunson).

It'd be ideal for HPC Carpentry to find its own niche, and the organization already excels at rapidly getting people who are sophisticated about their own discipline but inexperienced with scientific computing (let alone advanced computing) highly productive. So I think playing to that strength is a strong idea.

ChristinaLK commented 8 years ago

It looks like we're all in agreement that we shouldn't cover programming in OpenMP and MPI. :)

I think the point I'd want to emphasize is a variety of parallelization strategies (multicore, MPI, running multiple parallel jobs), but always in the context of a problem to be solved. So not just saying "here's the concept of threads + multi-server work", but "this is why you'd be using this approach in this case". I think it would be great to avoid "labeling" approaches and just present 2-3 computing problems where we solve them using different flavors of parallelization and explain the underlying concepts (and pros and cons!) as we go. Something like: "this works well as separate jobs, this works well as MPI, this works well in MPI but shouldn't be scaled past this point because you don't get any benefit." Etc.

In terms of course outline, this would fit into the third document that @justbennet posted here: https://github.com/swcarpentry/hpc-novice/pull/6 We could expand that third topic to not just be MPI, but have different examples of problems/softwares and how to solve/use them most effectively, and the underlying concepts/tools that allow us to make those smart decisions.
(Of course, it might be that the solution isn't even parallelization, but optimizing code (as @gvwilson mentioned above); in an HPC-related tutorial, that might be a little out of scope...)

If someone merges @justbennet's PR, I can add some of this as a PR to the third piece of the outline for further discussion. ;)

shwina commented 8 years ago

As part of the submission of this proposal, we need to present names and CV's for those who will be presenting the workshop. This list of presenters should be assumed as final, and may only be changed under extreme circumstances.

So, if you're sure about going to Supercomputing this year, and would be interested in teaching HPC Carpentry, please send me a message along with your CV. You can be put down as either "primary" or "secondary" presenter - please see the call for proposals for details about this.

The deadline for submission of proposals is the Apr 24th, so please let me know as soon as possible. Thanks, and apologies for the short notice.

shwina commented 8 years ago

+1 to @ChristinaLK's approach.

dbrunson commented 8 years ago

@shwina Are you up for this too? https://sites.google.com/a/lbl.gov/hpc-training-best-practices/home/news/sc16hpctrainingworkshopcallforsubmissions-duemay1

shwina commented 8 years ago

@dbrunson What do you have in mind? To me, it seems like SWC would have a much stronger presentation at "Best Practices for HPC Training" after HPC Carpentry has been taught a few times.

hneeman commented 8 years ago

So did it get submitted?

shwina commented 8 years ago

Yes, the final submission can be viewed here. Everyone, thanks for your contributions and inputs :-)

dbrunson commented 8 years ago

@shwina I agree it'd be good to do the HPC carpentry first.

gvwilson commented 8 years ago

can this one now be closed?

dbrunson commented 8 years ago

Yes, I suppose so! But the progress toward HPC carpentry continues. I talked about it at a BoF at XSEDE16 last week and there was a lot of enthusiasm.