jduckles commented 8 years ago

Moving Discussion from this [Discuss] thread to GitHub

Here is the call from Paul Wilson: sc16-tutorials-call.pdf

It seems there is some momentum to put a tutorial together and do some lesson development sprinting over the summer. From the [Discuss] thread it looks like Ashwin Trikuta, Dana Brunson, and Kate Hertweck seem to have thought about HPC and done some work in the context SWC pedagogy and workshop methods. There are lots of others with material for particular systems, so the first thing to decide is probably what should be included and what should be left out within the length of time allotted at SC tutorials.

If we're going to pull off an hpc-carpnetry workshop for SC16, I suggest we use this thread to form the team, then start opening issues in this repository and get hacking.

justbennet commented 8 years ago

It might be good to decide who the target audience is. Is this intended as a workshop for novice HPC users, as might be taught at a site with an HPC cluster, or is it a workshop for support staff at HPC sites who might want to offer it?

A workshop that assumes knowledge comparable to the SWC bash workshop and covers only a gross outline of what an HPC cluster is (hardware, scheduling software, batch manager, modules) and how to submit serial jobs typically takes us about 4 hours. We cover creating a runnable program script (e.g., a bash script that runs a program), converting that to a batch script that can run from the command line. Then we explain the requirements for batch submission script and how to convert the batch script into the submission script. We do that first for the echo command (as does Dana at OK State). We then give people a python script, show them how to run it, then have them convert that into a job submission script. We repeat that for an R script. We cover useful environment variables created by the batch system. We also show them commands to check queue status, how to cancel jobs, and how to copy results and generated data from the cluster (including unix2dos and dos2unix).

At the end of the workshop, participants have three tested and working templates of job scripts that they can use for future projects on our system.

This is for Linux, Moab scheduler, Torque PBS batch manager.

davidhenty commented 8 years ago

I've not been involved in this discussion before, but my colleague Mike Jackson told me about it (he ran SWC workshops in the UK for several years) and I am very interested in being involved. I am in charge of training at Edinburgh Parallel Computing Centre - we have run an MSc in HPC for over 15 years, and recently launched an online Practical Introduction to HPC course (as part of the University of Edinburgh's online offerings) aimed at novice users - so I have given quite a bit of thought to this area in recent years.

As someone has already commented, it is crucial to identify the target audience. For example, whether we expect attendees to have programming experience.

ocaisa commented 8 years ago

I wonder if introducing site-specific schedulers is really the best use of the training opportunity. There's a Python package called GC3Pie that can handle multiple schedulers. If we went the route of introducing MPI through mpi4py, we could use that to interface to the system. The Parallel Tool Platform plugin for Eclipse can also handle all schedulers and already has been set up for a large number of HPC systems.

I think there will be many people that really need to be introduced to core concepts in HPC architectures (distributed/shared memory, the memory hierarchy, HW threads, accelerators,...) and a lot of that can be done through an introduction to a combination of MPI, OpenMP and performance analysis.

shwina commented 8 years ago

Is this intended as a workshop for novice HPC users, as might be taught at a site with an HPC cluster, or is it a workshop for support staff at HPC sites who might want to offer it?

The role of the *Carpentries has been to cater to the former group than the latter. But given the expertise of trainers at SWC in delivering workshop material, maybe it would be more appropriate to teach a "HPC training best practices" workshop.

Another focus area might be "best practices" in HPC, i.e., data management, debugging, testing, version control, etc., in the context of HPC.

Any thoughts on which of these would be appropriate for SC16?

justbennet commented 8 years ago

I'll copy in a portion of a reply from Alan O,Cais (see the archive for full text) because it is pertinent to the above comment:

Alan O'Cais a.ocais@fz-juelich.de Fri, Apr 8, 2016 at 7:28 AM

It should also be noted that people involved in HPC training are already in discussion about creating collaborative training content since SC14/SC15 (and they have monthly TelCos on this). Their discussion site is at https://sites.google.com/a/lbl.gov/hpc-training-best-practices/home

Perhaps we can contribute to the workshop that they already have planned (and approved) for SC16? I would definitely like to ensure the Software Carpentry methodologies are considered there.

shwina commented 8 years ago

Thanks, @justbennet

justbennet commented 8 years ago

@davidhenty As you may be able to tell from my initial post, we consider novice to be someone who may not have much programming, if any, and is getting started in cluster computing because of data file size, memory requirements, or the sheer number of things that need to be run. The topics that are of most people in those categories aren't going to include things like MPI programming, but they might include how to run an MPI-enabled program that someone else has written and installed.

I suggested what I thought were the baseline skills that a novice from, say biology or psychology or music or any number of other fields, would need in order to be able to run a job for which they already have a program on a cluster. What do they need to know about batch computing? Scheduling? Data transfer?

For truly novice users, that could easily take a 1/2 day.

For people who want to do an introduction to MPI, perhaps you have a URL for your web site? TACC had what I thought was a nice, if a bit rushed, presentation on both python multiprocessing and mpi4py. The video is about 4 hours, and is from Mar 4, 2016.

https://www.youtube.com/watch?v=bCTzcwv9EDw

I have asked about slide availability.

I think there is room for both, but maybe not in the same room at the same time. Perhaps as another lesson in a multiday workshop?

https://www.youtube.com/watch?v=bCTzcwv9EDw

davidhenty commented 8 years ago

@justbennet We recently designed a 2-day "Hands-on Intro to HPC" course that was targeted at such novice users. We cover general concepts (this material should generally stay the same regardless of target platform) and illustrate them by supplying parallel programs that attendees can compile and run straight out of the box, using them to look at performance issues etc. I always like programs that produce graphical output, e.g. image-processing examples. You need different Makefiles and submission scripts for different platforms, but that's pretty minor as the code stays the same.

Our goal was that non-programmers would learn how to use a system through practical experiment, but that programmers could also look at the examples and learn a bit more if they wanted.

I also have some thought-experiments to illustrate distributed-memory (MPI) and shared-memory (OpenMP) parallelisation without needing to do any coding.

I'm actually much more interested in these aspects than doing more MPI courses, although I have some ad-hoc recordings of our MPI material at https://www.youtube.com/watch?v=_h55hwpLwoE&list=PL1b57Q937PoslcIozHf7UMIkfMHQQLVQe

It was actually this "Hands-on" course that was the basis for the new accredited University online course at https://www.epcc.ed.ac.uk/online-courses but that material isn't visible as it is fee-paying. The "Hands-on" course is available at, eg, https://www.archer.ac.uk/training/course-material/2015/07/intro_epcc/

justbennet commented 8 years ago

@davidhenty Thanks a ton for the extra thoughts and links!

shwina commented 8 years ago

@davidhenty - these are great resources. Thanks for sharing!

jduckles commented 8 years ago

@shwina has offered to host a meeting. Can everyone in this thread please fill out this "when is good" for the next two weeks to get folks together in the context of the SC16 Tutorial call:

http://whenisgood.net/ntgih55

Once a meeting time looks good, I can provide a BlueJeans video conference room for the meeting.

@shwina has suggested the following agenda:

What should we teach? Interacting with HPC systems? How to teach HPC? mpi4py? At this point there are several opinions.
What existing materials exist? Do we want to develop from scratch?
Putting together a proposal (github.com/swcarpentry/sc2016-proposal?)
Who will lead the effort?

iamc commented 8 years ago

I totally agree with @davidhenty approach. ++1

justbennet commented 8 years ago

I had this reply from TACC about slides for their Parallel Processing with Python workshop. In case anyone is interested in these materials on python multiprocessing and mpi4py.

Bennet,

As long as TACC gets credit for the material and it is being used by non-profit organizations then its ok to share.

The parallel processing with python slides can be found at the link below. If you use the Download All button the site will not require a TACC Portal signin to access the files.

https://portal.tacc.utexas.edu/training#/session/17

Jason Allison - User Services Texas Advanced Computing Center The University of Texas at Austin

mboisson commented 8 years ago

Hi, Before people think of teaching MPI, I believe everyone should read this article: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/

I agree with most of what is reported in this article. MPI is still relevant to existing codes, but it is very rarely the best approach for new developments. Assuming that the Carpentries are targeting novice audiences, it would be best to focus on more interesting technologies.

IF one wants to teach MPI and wants to do so in a compiled language, I would highly recommend to look at Boost::MPI (http://www.boost.org/doc/libs/1_58_0/doc/html/mpi.html). This is what the C++ MPI API should have been before it was deprecated. It makes using MPI much easier. Dealing with complex structures becomes trivial with Boost MPI.

omsai commented 8 years ago

What do they need to know about batch computing? Scheduling? Data transfer? [...] We cover general concepts [...] and illustrate them by supplying parallel programs that attendees can compile and run straight out of the box, using them to look at performance issues etc.

What would be our ideal student outcome? For example:

Lesson Objectives

Become familiar with HPC concepts including parallelism, the scheduler and basic shell commands:

Comprehend the strengths and limitations of using a cluster.
- By solving challenges during the workshop.
Apply the scheduler to submit jobs.
- Taught with hands-on exercises.
Analyze and report problems with useful details for community help and HPC administrators.

Lesson Goals

Given a written description of an HPC software scenario, the student will be able to:
- identify whether the software "can run on the cluster"
- recall their resource limits
- suggest how well software should perform
- identify login and compute nodes, and look up node capabilities
Given a computation job, the student will be able to:
- use modules and environmental variables to setup their software environment
- transfer files
- create batch files and use the scheduler
- inspect job results
Given an underperforming program, the student will be able to:
- translate problems to computer peripheral: CPU, RAM, disk, or software
- propose where the bottleneck is
- use the commandline to locate the bottleneck
Given an error message, the student will be able to:
- run basic diagnostics to scope the problem and speculate several explanations
- express fluency with formal vocabulary
- find resources on the web using formal vocabulary and domain specific keywords
- communicate clearly about issues, in an organized format, with relevant details, including citing troubleshooting done, to file reports with HPC administrators, forums, etc

SWC modules

We might consider 3 areas or modules:

UNIX command-line (per Jonah's point, how much do we reuse of our shell-novice module?)
- Include housekeeping of data transfer using e.g. FileZilla?
- Maybe also include GNU screen or tmux since these lessons of working on clusters requires constant internet access; I've need to include this due to spotty wireless internet at previous trainings.
The scheduler and environment modules
- Login and compute nodes, and using the scheduler
- module [avail|load|list|show]
Parallel performance; troubleshooting bottlenecks and errors; best practices; some sort of capstone project?
- tail -f /path/to/file to watch job output
- ssh into the compute node, htop to see CPU and memory usage
- Communicating issues like a pro: job #(s), user login, all steps to reproduce problem

Our goal was that non-programmers would learn how to use a system through practical experiment, but that programmers could also look at the examples and learn a bit more if they wanted.

A lot of new users also ask for example programs, so having examples on hand would be really nice to include with the repository / instructional material. Since we have R and Python lessons, we could include those.

I am also interested in helping create a Julia lesson, since parallel computing is a first citizen in the language. Having a Julia lesson has been discussed on the mailing list recently, and I think the use case of HPC, makes the language a useful medium for teaching parallel programming concepts since the user is abstracted away from MPI. Are any of you coming to JuliaCon in June? :)

we consider novice to be someone who may not have much programming, if any, and is getting started in cluster computing because of data file size, memory requirements, or the sheer number of things that need to be run. The topics that are of most people in those categories aren't going to include things like MPI programming, but they might include how to run an MPI-enabled program that someone else has written and installed. [...] I also have some thought-experiments to illustrate distributed-memory (MPI) and shared-memory (OpenMP) parallelisation without needing to do any coding. [...] Before people think of teaching MPI, I believe everyone should read this article: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/ MPI is still relevant to existing codes, but it is very rarely the best approach for new developments.

It seems that there's some agreement not to much spend workshop time on MPI coding. As @justbennet mentions, typical cluster users interact with readily available MPI software and instead need to troubleshoot job failures and performance issues.

omsai commented 8 years ago

Another module idea (which might be more of an "advanced" candidate?)

It would be cool if we can create a lesson on using Linux Containers to empower users to be in more control of the software the run, run across multiple platforms / clusters, and have better reproducibility. Sometimes using containers is essential, e.g. if the software requires a different version of glibc than what's on the compute nodes, or if the software is complex. Google uses Linux Containers extensively so that they don't have to care about what hardware their jobs run on.

psteinb commented 8 years ago

Hi, I think this is an interesting discussion. I also feel it might be based (mostly towards the end) less on numbers/data and rather on local usage patterns. The comment about the Dursi blog was readily made and indicates how deep the HPC community is a "transition period". Just look at last year's ISC conference that was split into 2 parts (one for HPC and one for big data). So, in order to come up with a curriculum, all these local specializations need to be abstracted away and key concepts should be identified that are helpful for the majority of novice HPC users. For example, I think @omsai has made good point to include how to document performance problems/bugs for HPC admins and the community behind a specific package. On the other hand, if one e.g. uses too high an abstraction for scheduler, the tendency is to teach this abstracted library which in the end requires you to understand the underlying batch system anyway.

So I would propose to come up with some data first (feel free to reference any existing collections)!

what batch systems are you using?
what portion of total CPU time per year of your local cluster is spent on:
MPI jobs
SMP jobs
embarrassingly parallel batch jobs (1 input file per core)
your idea goes here! In order to guide this, I think @omsai's idea needs to be taken further and we should start writing a design document to report what this workshop is targeted at.

mboisson commented 8 years ago

I don't think "proportion of total CPU time per year" is the best metric to use if you want to find what is most useful to users. "number of different users" would probably be more appropriate. It is very likely that the bulk of the CPU time is used by a few very large users, who definitely don't need our help.

psteinb commented 8 years ago

Very well spotted, however, in my experience, a non-negligable portion of users that consume the most CPUhours, would require a decent training as well. ;)

Anyhow, the point I was trying to make is to survey what is needed first, before darting off and composing material (or merging in existing material). So I guess, you buy into that.

justbennet commented 8 years ago

@psteinb We have for the last several years offered something roughly equivalent to the Bash lesson, one on the fundamental mechanics of using our cluster, and one that is an introduction to the basic concepts of parallel computing (architecture, types of parallelism, examples). In both Fall and Winter semesters, we've seen enrollments of 30-90 for the Bash lesson, 20-50 for the cluster use, and 10-20 for the basic concepts of parallel computing with examples. Just in terms of self-selection, the demand seems to be higher from the least experienced people for the most basic material. We've had poorly attended MPI workshops for more than a decade, and those tend to generate comments at the ends of the spectrum (too hard or too easy, no middle).

People from those workshops ask us most for a follow-on workshop (or workshops) on managing data and processing for multistep pipelines. For example, taking data from an MRI scanner and running a preprocessing stream on it to generate images for QA and data suitable for subject-level and group analysis. That's a five step process involving three or four different pieces of software, none of which is parallel (except insofar as BLAS is implicitly parallel in two of them). I gather that the people in epigenetics have similar, multistage pipelines.

A workshop on pipelines and/or workflow management seems to me to blend well into the existing curriculum, it's going to be pretty immediately useful to many people who are using clusters largely as a serial batch queue (is that HTP?), and there isn't a significant gap between the topics and the level of at which the topics are taught and the previous SWC workshops. I think that keeps the progression from one set of basic skills to another cognitively smooth, which is important, as it allows people to build some competence and earn themselves some confidence more readily. It's also going to be solid preparation for any subsequent training that might utilize a cluster.

Perhaps it would be best to calve my idea off to another venue, if I'm off track here. Given the short timeline for the SC proposal, I think it most important that some consensus about what's going to be offered be arrived at soon, and it seems like there are more people interested in the traditional HPC material than the serial/sequential job analysis track.

apawlik commented 8 years ago

Martin Callaghan (University of Leeds UK) suggested this curriculum https://github.com/hpccarpentry/organisation/blob/master/hpcc-lessons.md

There has been a lot of work done by the community here https://github.com/datacarpentry/hpc-carpentry

At the moment it would be good to pull everything together to avoid redundancy.

Maybe it's an idea to list all existing freely available training material, try to cluster it - see what topics come up, see how that could be put together into a curriculum and identify gaps for material that needs to be created.

shwina commented 8 years ago

All, some really good points have been raised here. As @justbennet mentions in an above comment, it's important we arrive at a consensus that gives us direction for a draft proposal for SC'16. For this, I propose that we meet online - I'll summarize the points so far to facilitate discussion. Many of you have indicated when is good:

http://whenisgood.net/ntgih55

Results are here:

http://whenisgood.net/ntgih55/results/ihsxse7

May I propose Wednesday at ~~8:00 AM EST~~ 9:00 AM EST (sorry for the confusion) as a meeting time? @justbennet - I know this is not a good time for you. If you would like to propose another time, please feel free to do so.

justbennet commented 8 years ago

I can miss the net meeting. I'll be at SWC instructor training. :-) I've tried to give a good sense of where my interest is, and if that's where others go, cool, but if not, equally cool. As I will have limited time between now and August to work on this, I think others should set the agenda, and I'll contribute appropriately to it. I know something good will come of this. Thanks for wrangling the logistics!

ocaisa commented 8 years ago

@apawlik There's a project in the US that was trying to cluster (and review/rank) HPC training materials http://hpcuniversity.org/trainingMaterials/

mboisson commented 8 years ago

Is this interesting for the HPC part ? http://calculquebec.github.io/cq-formation-advanced-python/ul-20160216/index.html

shwina commented 8 years ago

@mboisson - good stuff! Thanks for sharing. A combination of shell/Python will probably be the platform for the first iteration of HPCCarpentry, so this is a valuable resource. I'm also very interested in it personally for my own teaching.

justbennet commented 8 years ago

@mboisson I agree with Ashwin. Thanks for posting the link.

shwina commented 8 years ago

All interested parties, please note that we are meeting online April 13th @ 9:00 AM EST to discuss putting together a proposal for SC '16. Here's the link to the BlueJeans meeting room (thanks @jduckles! ):

https://bluejeans.com/329525866

And etherpad:

http://pad.software-carpentry.org/hpccarpentry-2017-04-13

I'll put up an agenda and a summary of discussion so far on the etherpad shortly. Thank you!

gvwilson commented 8 years ago

Isn't that after the deadline for submitting tutorial proposals?

jduckles commented 8 years ago

Tutorials are due 4/17 with an automatic one week extension.

raynamharris commented 8 years ago

Can I put this on the SWC community calendar so that more "interested parties" can join?

shwina commented 8 years ago

@raynamharris Yes please, I didn't want to spam the discuss list.

shwina commented 8 years ago

Summary of online meeting on Apr 13

Here's a summary of the meeting yesterday, where we discussed what could be covered in a 6-hour workshop, talked about some logistical details, and decided on a path to proposal submission. Thanks everyone who took part, and to everyone who provided inputs so far - it really helped the discussion have direction. Feedback on yesterday's summary is appreciated!

Note: deadline (Apr 17 with a one week extension) is fast-approaching. We need volunteers who can provide access to a computing resource. And we need another volunteer who can help sort out registration and submission details (see below).

Attendees

Alan O'Cais : HPC training interest in Europe Dana Brunson: Director of OK State HPC center, best training practices with Scott and others. XSEDE campus champion organizaiton Pariksheet Nanda: U Conn. Sys admin Rayna Harris: U of Texas at Austin, grad student, training with HPC Ashwin Srinath: Clemson University, PhD Student, Cyberinfrastructure Technology Integration group Aleksandra Pawlik: University of Manchester, SSI - soon moving to New Zealand NeSI (HPC infrastructure)

Agenda

Workshop contents
Available relevant materials
Logistical details
Writing proposal

Workshop contents

AM

Good for introducing basics of interacting with the cluster/resource

Logging in
Asking for an interactive session/job*
Submitting batch jobs
Querying job status and effectively reporting errors with jobs

*Alan makes the point that maybe we shouldn't advocate interactive sessions

site specific
bad practice (learners may choose interactive over batch jobs)

PM

Don't teach parallel programming Use parallel programs in examples Stress on workflows and best practices. Two workflows:

"Embarrassingly" parallel workflow, several files (inputs), one serial program
"Real" parallel workflow, program that is [OpenMP, MPI] enabled

"Develop" these workflows starting from serial programs

A note on site-specific schedulers, resources, policies, etc.: obviously, these differences are important, and in some cases can be a harsh barrier (in much the same way that the difference between git/svn can be). The real question is, are the differences harsh enough that they make it impossible for learners to translate the knowledge gained from the workshop to their own sites? One solution is to point out frequently in the workshop where differences may arise, and maybe provide examples from different schedulers/sites. Further, allow access to the computing resource used during the workshop for a while longer so that learners can come back to experiment.

Available relevant materials

Dana Brunson’s materials:

https://github.com/dbrunson/hpc-novice
See also: https://github.com/dbrunson/hpc-novice/issues/1 (great metaphor; introduces super-important “HPC etiquette”)

Ashwin Srinath’s materials:

http://clemsoncoe.github.io/hpc-workshop/index.html

Ecology data set (thanks to Aleksandra for pointing out):

https://github.com/datacarpentry/ecology-workshop/blob/master/data.md
Easily comprehended by learners across variety of disciplines
Good target for “embarrassingly” parallel workflow

Proposal writing and lesson development

A GitHub repository is available here: https://github.com/swcarpentry/sc16-tutorial-proposal Ashwin Srinath has volunteered to draft a proposal outline (proposal.md) as a starting point for contributing, which will be available shortly.

We also need some other details (any volunteers for this?)

How many registrants/how to register as a group?
What details are required from registrants?
Who will be teaching at SC ’16, and how to decide?

Reach out to other groups:

International HPC training consortium are interested in developing HPC lesson - Dana Brunson
ACI-REFs?

Logistical details

Need cluster access for the workshop, and for a duration after the workshop (suggestions/volunteers?)

Dana Brunson may be able to offer this.
Ashwin Srinath: proposes cloudycluster as an alternative: http://www.cloudycluster.com/#intro (we've got some credit)
Alan O'Cais: Created training cluster with 8 nodes, and web terminal. (http://supercomputing.cyi.ac.cy/)

mboisson commented 8 years ago

For the part about submission of jobs, we might want to have different lessons for different schedulers, the same way SWC has Intro to programming with Python, R or Matlab, or Source code management with Git or Mercurial.

shwina commented 8 years ago

Please note that the proposal repo (https://github.com/swcarpentry/sc16-tutorial-proposal) is open for business :-)

shwina commented 8 years ago

Proposal writing has begun: https://github.com/swcarpentry/sc16-tutorial-proposal/pulls. Please get involved by raising issues/submitting PRs/commenting :-)

omsai commented 8 years ago

It might be important to differentiate our proposed tutorial from a similar "Parallel Computing 101" tutorial that's been offered every year for the past few years (here is the 2015 video abstract and 2013 course materials). It looks like the 101 tutorial is more of a "big picture" conceptual introduction with a large breath of material; so I particularly like how our abstract currently relates to SC's mission of "basic lab skills for research computing".
We should revise our abstract to be more user facing than for the proposal reviewers, since it seems the organizing committee uses the proposal abstract in the conference program page (looking at other tutorial programs from previous years). In some cases, other program abstracts mention their target audience, and so we could say our tutorial would be primarily of interest for teachers developing or improving their instructional material for novice users by sitting on the opposite side of instructional experience.

I'll look more closely at wording and make suggestions tomorrow. (edit: typo)

omsai commented 8 years ago

@jduckles Do we have data of past offerings of swcarpentry/hpc-novice?

For our proposal description we need to explain:

If your tutorial has been presented previously, a list of when and where it has been presented and how it will be updated for SC16

shwina commented 8 years ago

@omsai: very good points. Maybe make it explicit that this will not be an introduction to MPI or OpenMP. Also, thanks for pointing out what the abstract should look like. Maybe the "vision for HPC Carpentry" could be moved to the "detailed description" section.

jduckles commented 8 years ago

@omsai We have never offered it, the lesson hasn't ever existed...until now 😀

The repo was created awhile back as a placeholder.

lmichael107 commented 8 years ago

Jonah is correct. The hpc-novice repo was created as a placeholder after a train-the-trainers for ACI-REFs (an NSF-funded network of "facilitators" who support users of campus research computing centers).

'Sorry I'm late to the game, btw! Work like this is my job and my passion for making researchers more effective in computing.

I am a strong supporter of not including an introduction to MPI or OpenMP, especially because these are not the only research-applicable forms of parallelization. I know the proposal is effectively complete, but I would even say that we really only need to go as far as running multiple single-core jobs, maybe each with a different input file to process in "high-throughput" (which is not truly "serial" as some would call it). Beyond that, any type(s) of parallelization we use as an example, in effect, might mis-lead learners to believe that the presented method is the only/right method. In reality, the right form of parallelization depends on the nature of the computational work. For example, OpenMP restricts the extent of parallelization to the number of cores on a single node, which is less helpful when needing to process thousands of images that might each take hours. Even established software like blast (genetic sequence mapping) will be severely limited in its multi-threading option if the size of the reference and/or input file are sufficiently large (and they don't even have to be that large), where it's better to just break the input file into pieces and run separate, single-threaded blast executions.

gvwilson commented 8 years ago

I also think that this tutorial should not include OpenMP or MPI; what I'd very much like to see instead is a solid chunk (30 minutes plus exercises) on how to figure out what the performance bottlenecks of your code really are. @callaghanmt started putting notes together in https://github.com/callaghanmt/sc-leeds-profiling last fall - the idea was to introduce people to profilers (which most scientists have never heard of), and show them that changes to sequential code can yield substantial performance improvements before they tackle multi-anything. It may be too large an example for this tutorial, but the final episode of the invasion percolation lesson from Version 4 of Software Carpentry (see http://v4.software-carpentry.org/invperc/tuning.html) shows how to switch from a naive grid-and-sweep approach to incremental update, thereby reducing runtime by several orders of magnitude.

shwina commented 8 years ago

@lmichael107 @gvwilson: we definitely don't want to introduce OpenMP, MPI, CUDA, Hadoop, or any particular parallel/distributed computing platform. But I do think we should demonstrate the usage of a shared-memory and distributed-memory parallel enabled program. Without talking about specific details of the platform, we can still talk about cores, nodes, threads, how a problem is distributed among them, and how to choose resources effectively. Not doing this has its own dangers: for example, it leaves researchers in e.g., CFD, simulation, without a mental model for their problems.

Yes, I agree strongly that profiling serial code and using the right algorithm is far more important, and maybe that should be the first thing we teach. Fix your serial code before running it in parallel. But working this into the lesson can be a problem.

ocaisa commented 8 years ago

I would strongly agree with @shwina, we shouldn't give a preference for high throughput computing just because the concepts behind high performance computing are perhaps more difficult to grasp. The mental model that @shwina mentions is something I think we can get across in an afternoon and that has real value to both communities.

dbrunson commented 8 years ago

The vast majority of the HPC users here at Oklahoma State don't write code at all. They typically use code that we install for them. There's a mix of serial, shared memory and distributed memory parallel applications. The users need to know enough about them to know how many nodes/processors to request in the scheduler. They aren't interested in profiling these codes or learning the code. Or even if they are interested, they are in a hurry to get their research done and don't have time. Most of them have never used the command line before and it's a big leap just to run their job on a cluster.

I think these people are the target audience for the initial swipe at HPC Carpentry materials.

psteinb commented 8 years ago

While I think Greg is right across the board, I agree that this material should not cover them. I opened 2 issues under the python-intermediate-mosquitoes material: https://github.com/swcarpentry/python-intermediate-mosquitoes/issues/12 https://github.com/swcarpentry/python-intermediate-mosquitoes/issues/13

justbennet commented 8 years ago

There has been a lot of discussion around the topic, but @omsai's outline has not got much comment on a point--by-point basis. It's a pretty good starting point, I think, for a more focused and directed discussion. He divides things into three 'modules',

Basic Linux (and issues with remote computing)
Batch computing and scheduling
Performance, profiling, and parallel computing

I suggest that each of those is a lesson of about three to four hours, and that each should get an outline. The drafts of the actual topics (i.e., 3-10 minute lesson segments, presentation and exercise) and materials, with time estimates for each topic, and for each topic an explanation of a) the prerequisites for proper grappling with the material and b) the objective of presenting the material should be put into the repository.

This discussion should divide among those lessons and topics. That way discussion about how each of those should be taught can start.

The first lesson is essentially the Bash lesson with modifications for remote computing, ssh, scp, and an eye toward maintaining job control scripts.

The second lesson builds on the first and presumes that learners can create files and directories, use nano, can ssh to a remote linux machine, and can copy files to/from the remote linux machine. It will cover basic layout of a cluster (bunch of PCs connected by network), what a batch manager is and does, what a scheduler is and does, what the parameters of a batch job are, how to run a job, check its status, and delete it.

The third lesson builds on the second and....

I think there are several people who might like to work on the first two, and there are quite a few who are more interested in the third. Forking the discussion might make us all more focused and productive?

gvwilson commented 8 years ago

This is where I put up my hand and say, "An overview concept map for each of those three sections would probably help people converge/understand each other".

:-)

dbrunson commented 8 years ago

Hi folks, the call for presenters for the 3rd annual HPC workshop on Best Practices for HPC Training at SC16 is out: https://sites.google.com/a/lbl.gov/hpc-training-best-practices/home/news/sc16hpctrainingworkshopcallforsubmissions-duemay1 and due May 1.

ChristinaLK commented 8 years ago

+1 :)

swcarpentry / hpc-novice

SC16 Workshop Call #4

Lesson Objectives

Lesson Goals

SWC modules

Summary of online meeting on Apr 13

Attendees

Agenda

Workshop contents

AM

PM

Available relevant materials

Proposal writing and lesson development

Logistical details