Audience Selection - Githubissues

reid-a commented 3 years ago

A good strategy for resolving some of our on-going discussion about what level of prior expertise to require or assume is to make an explicit decision about what audience we want to address, and build the lesson for them. This will detract from overall breadth and generality, in favor of focus. The major advantage will likely be to get us off some of our discussion treadmills, especially with respect to the parallel example codes.

Possible audiences we might address are:

Complete novices with no experience writing or running codes.
Learners who can run codes, but not write them, who are motivated to explore HPC resources for performance or scaling reasons.
Coders who can write and deploy serial codes on local resources, who want to scale up to HPC.

Possible criteria we might consider to help guide the decision:

Which of these audiences is larger?
Which of these audiences is under-served by existing resources?
Where are there existing resources? We may be able to add value by better-integrating existing resources, or by supplementing them.

aturner-epcc commented 3 years ago

@reid-a I have not been as involved with HPC Carpentry recently as I was in the past but your learner use case:

Learners who can run codes, but not write them, who are motivated to explore HPC resources for performance or scaling reasons.

may be served by a lesson we developed under the ARCHER2 UK supercomputing service:

Understanding Package Performance

At the moment, some parts are specific to ARCHER2 but it could be easily generalised (and I would be happy to do this) for inclusion in the HPC Carpentry set of lessons. What do people think? Would this be a useful/interesting lesson within HPC Carpentry?

This lesson was developed with the following researcher profile in mind:

I have used other remote HPC systems before
I want to use pre-installed simulation/modelling software packages rather than develop my own (I may also compile my own copy of software developed by someone else)
I want to be able to improve the efficiency of the use of HPC in my research workflow
I want to be able to optimise my use of resources by understanding how to use software packages efficiently

wirawan0 commented 3 years ago

THIS COMMENT IS WIP

I want to help address the question "Which of these audiences is under-served by existing [training] resources?" To do so, I propose that we take a survey at currently available training materials--both free and paid to see the gap in the training materials. For this posting, let me define 3 terms, corresponding to the 3 types of learners mentioned by Andrew in the first post: Novice (no prior experience whatsoever), Users (no coding or light coding), and Coders (heavy coding). As a consequence of this taxonomy, people who are doing data science currently end up being categorized as "Coders" though their level of competence is not necessarily the same as power Coders who know exactly what the computer is doing.

I am aware of the following HPC-centric training materials -- I'll try my best to describe what type of communities they are serving:

Scaling to Petascale (co-sponsored by the Blue Waters project): This training seems to be geared for Coders who already know HPC, and focuses on parallel computing techniques such as MPI, OpenMP, OpenACC, on using GPU, performance tuning, etc. geared toward large-scale (petascale) computing.
XSEDE monthly HPC workshops (by Pittsburgh Supercomputing Center): Focusing on introducing MPI, OpenMP, OpenACC, Spark and TensorFlow for those who are novice to these topics. The training is fast-paced and assumes programming skills (in C, Fortran, or Python depending on which topic). Again, for Coders.
TACC Learning Portal has a lot of materials geared toward those using TACC. It is clear that the audience is expected to already know programming--Coders. Also see TACC Institute Series, offering for-fee courses are intended to be "immersive training in advanced computation".
SDSC Summer Institute "focuses on a broad spectrum of introductory-to-intermediate topics in High-Performance Computing and Data Science. The program is aimed at researchers in academia and industry, especially in domains not traditionally engaged in supercomputing, who have problems that cannot typically be solved using local computing resources." [source] I think it is also fair to say that this covers Coders for the most part.
ARCHER2 has a breadth of training materials geared toward "researchers" (non-coders or light coders), data scientists, "RSEs" (heavy coders). In other words, this covers all spectrum of learners.
HPC University holds a large catalog of pointers on training materials. This is a project of Shodor Foundation. Some are domain-specific; it is not easy to judge the target audience (some are basic, many are advanced). This may cover all kinds of learners but I don't fully know.
Cornell Virtual Workshop covers wide variety of topics, from Fortran, C, C++, MPI, OpenMP, Python, R, MATLAB, even Singularity and SLURM. There are also basic stuff like Intro to Linux. For the most part this covers Coders type of learners.
Supercomputing MOOC offered by PRACE. Unlike many of the courses/trainings above, this is a very high level, conceptual course regarding supercomputing. Good for outreach, but not for introducing someone to use a real supercomputer. Target: Novice.

Many universities and academic/research HPC centers have their own versions of "intro to HPC". Too many to list here; only a few representative examples:

Utah CHPC: (Getting started guide) (Tutorial videos)
OK State HPCC tutorials

It is fair to say that Coders are well-represented in the trainings mentioned above. Users who do not need to code heavily has some representation; perhaps many more of them will be trained in domain-specific training materials (e.g. they can find their own stuff in their package-specific websites) but the level of introduction to HPC on those sites would greatly vary. Novices are truly underserved, IMO.

hpc-carpentry / coordination

Audience Selection #54