Instructor notes - Githubissues

tavareshugo commented 5 years ago

To make it easier for us, the exercises for the course have been compiled here.

Outline of things to cover:

Create Rproj and a project folder with sub-folders "data", "data_output" and "scripts"
- try to make students use a "tidy" directory structure
Intro
- exercises 1.1 and 1.2
data.frames
- use read_csv() from the beginning to simplify things? (optional, but I find it useful to keep consistency across the course)
  - although not in the current materials, I think it's worth insisting on explicitly using option on how missing values are encoded
- skip factors - I've found it more intuitive to cover them in plotting lesson (see below)
- Main thing is to explain [rows, columns] for subset and $ to access column. For example I personally avoid showing the 4 different ways to access a column listed in the materials (usually it's quite confusing for beginners)
- exercise 1.3
dplyr
- skip spread/gather? (covered in extra RNAseq lesson) - but if we're good on time we could do it then and then repeat in the extra lesson.
- exercises 2.1, 2.2, 2.3, 2.4
  - Often do exercise 2.4 with students to save time
ggplot2:
- skip themes and customisation - usually we just demonstrate a couple of things and then refer to the materials showing other things that can be done
- encourage learners to use web search when wanting to customise their ggplots. E.g. "how to change axis label orientation ggplot2".
- extra: see note below to mention factors
- exercises 3.1-4 (if time is short do some exercises together)
  - I've changed some of the exercises in relation to the materials, for example to include one where we use factors to improve visualisation.
  - There's "extra" exercises for more advanced/quick learners.

note: extra material for ggplot2 section

So that students intuitively understand factors, introduce them in the plotting section.

For example:

When doing this plot:

surveys_complete %>% 
  ggplot(aes(sex, hindfoot_length)) +
  geom_boxplot()

What if we want to change the order of the x-axis labels to be "M" first?

Then we need to learn about factors, which are a special way that R has to encode categorical variables.

Let's look at factors using a simple example first. Then go through the example of the course materials here, but only the very first section of it.

From there, jump back to the plotting problem and resolve it:

surveys_complete %>% 
  mutate(sex = factor(sex, levels = c("M", "F")))
  ggplot(aes(sex, hindfoot_length)) +
  geom_boxplot()

Exercise 3.4 applies this concept again.

tavareshugo commented 5 years ago

Day 1 and 2 - split materials:

Spreadsheets (Hugo)
Intro (Hugo/Georg)
dplyr (Florian/Thea)
ggplot2 (Florian/Thea)

Day 3 - split materials:

Intro + Variation within samples
Covariation + Properties of count data
PCA
Lesson under dev (Hugo)

tavareshugo commented 5 years ago

Extras at the end:

Demonstrate Rmd at the end.

theavanrossum commented 5 years ago

Day 1

Intro - EMBL, sticky notes, etherpad, code of conduct

Spreadsheets (Hugo)

Intro (Hugo)

data.frame + select + filter (Georg)

dplyr starting from pipes (Florian) (no gather/spread)

Day 2

review sticky notes from yesterday

dplyr (Florian) (no gather/spread)

ggplot2 (Thea) +factors + rmd

(gather/spread if there's time) (Hugo)

sql (Hugo) - 1 hr

Day 3 - split materials:

review sticky notes from yesterday

Intro + Variation within samples (+ gather/spread) (Thea)

Covariation + Properties of count data (Florian)

PCA + limitations of biplot (Georg)

Clustering (Hugo)

debrief & survey bus leaves for hdb at 5:30

tavareshugo / 2019-01-29-EMBL

Instructor notes #2