matloff / TidyverseSkeptic

An opinionated view of the Tidyverse "dialect" of the R language.
522 stars 45 forks source link

error in tapply() example #39

Open john-d-fox opened 2 years ago

john-d-fox commented 2 years ago

Dear Norm,

I've enjoyed the various versions of your tidyverse critique and largely agree with it. I noticed the following error in the current version, which I don't believe has been flagged before:

Your tapply() example doesn't handle NAs consistently.

> aggregate(airquality[, "Ozone"], 
+             list(Month = airquality[, "Month"]), 
+             mean, na.rm = TRUE)
  Month        x
1     5 23.61538
2     6 29.44444
3     7 59.11538
4     8 59.96154
5     9 31.44828

>   aq <- na.omit(airquality)
> tapply(aq$Ozone,aq$Month,mean)
       5        6        7        8        9 
24.12500 29.44444 59.11538 60.00000 31.44828 

The following would be consistent with the tidyverse solution and aggregate():

> tapply(airquality$Ozone, airquality$Month, mean, na.rm=TRUE)
       5        6        7        8        9 
23.61538 29.44444 59.11538 59.96154 31.44828 

``

Actually, I'd prefer

with(airquality, tapply(Ozone, Month, mean, na.rm=TRUE)) 5 6 7 8 9 23.61538 29.44444 59.11538 59.96154 31.44828

Though it requires more explanation, it encourages what I believe to be a better habit.

Best, John

dusadrian commented 2 years ago

I planned to write almost exactly the same thing. Although very efficient, the function tapply() can be quite cryptic for many users especially when splitting by more than one factor, when the split argument has to be a list. There is an alternative in the most recent version of package admisc (0.30), which I find a lot more intuitive and easier to remember:

using(airquality, mean(Ozone, na.rm = TRUE), split.by = Month)

   mean 
5 23.615
6 29.444
7 59.115
8 59.962
9 31.448

Additionally, instead of:

mtcars$gear_char <-
 ifelse(mtcars$gear == 3,
   "three",
   ifelse(mtcars$gear == 4,
   "four",
   "five")
 )

this is arguably also more intuitive:

mtcars$gear_char <- recode(mtcars$gear, "3 = three; 4 = four; 5 = five")
john-d-fox commented 2 years ago

I think the object was to do this without loading non-base-R packages. If that requirement is relaxed, there's also the Tapply()function in the car package, which provides a formula interface to tapply().

dusadrian commented 2 years ago

Indeed. To me, "base R" is anything not related to the tidyverse dialect, using classic, traditional R code. The base package surely cannot do everything, and comparing it (alone) with the whole tidyverse is more than unfair.

john-d-fox commented 2 years ago

It's my impression that 'base R' typically refers not just to the base package but to the R packages loaded by default at start-up or the packages in the standard R distribution.

dusadrian commented 2 years ago

You are correct, that should be the interpretation of the 'base R'. But even so, the tidyverse dialect is orders of magnitude bigger, so that comparing (I believe) is still unfair without contributed packages using standard R code. If my understanding is correct, the point of the TidyverseSkeptic is to make a fair comparison between 'traditional' R and the tidyverse dialect.

matloff commented 1 year ago

Great discussions, and again, sorry I'm late to it. I just today looked at the Issues posts.

Once again, though, my overriding goal is to make things easy for beginners. That excludes using other packages, for instance.

As to tapply(), I'm not offering it as a panacea, just something I think is easier for noncoders to learn and use.

If tapply() doesn't quite work, I recommend that beginners--the horror!--write a loop.