swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 383 forks source link

R: mean() and median() have been deprecated for dataframes #859

Closed chendaniely closed 9 years ago

chendaniely commented 9 years ago

The R lesson uses mean() on a subset dataframe, this will return the following error:

Warning message:
In mean.default(inflam[1, ]) :
  argument is not numeric or logical: returning NA

need to first cast the vector using as.matrix() before passing into function http://stackoverflow.com/questions/19697498/r-beginner-argument-is-not-numeric-or-logical-returning-na

jdblischak commented 9 years ago

Could you please point out where in the R lessons you found this? I am unaware of any example (and also could not find one with a quick search). In the lessons, we take the mean and median of a column of a data frame. A column of a data frame gets converted to a vector. A row of a data frame is still a data frame, which produces the error you have found.

dlebauer commented 9 years ago

I just did a quick search of the repository for inflam and could not find any results. I also searched for mean, and found no violations.

However, I can reproduce the error by trying

data(iris)
mean(iris[1,])

But this is caused by taking the rowmean, which may not have been intended The column mean mean(iris[,1]) works.

chendaniely commented 9 years ago

@jdblischak @dlebauer apologies, did row means when I taught the lesson material, and the error showed up. was very confused about it when it happened and saw that it was deprecated in from R 3.0.0 onward

not sure if you want to edit the lesson for instructors to say you need to cast to matrix if you want to do row means

sorry for the noise, closing the issue.

jdblischak commented 9 years ago

Ah, now I understand the problem. You found the error during a live workshop, not when you were reviewing the lesson material.

This error is really annoying, and was definitely a problem as we tried to translate the novice Python lessons to R. In order to mirror the intention of the Python lessons, which was to show some quick summary statistics without getting bogged down in the details of different data structures, we ended up performing the mean and median only on the columns of the data frame, which correspond to the days of the treatment regimen (for the discussion see #639).

I agree that we need to have a better guide for instructors for teaching this material since most will be unaware of these issues we purposely avoided. Unfortunately these are turbulent times for the lesson material. First, the novice material is being extracted into its own repo (see #880) just like all the other subjects. Second, we'll need to convert everything to the new lesson template (latest version). Then finally we could add this practical information on avoiding this error to the Instructor's guide.

Lastly, if you have time, it would be great if you could send a quick message to the r-discuss mailing list to share your experience on how the material worked for your audience and how far you were able to get through the lessons.