harrelfe / Hmisc

Harrell Miscellaneous
Other
204 stars 81 forks source link

Median if labelled #140

Closed zachariae closed 3 years ago

zachariae commented 3 years ago

Dear all,

during the last days, I faced a problem. I was no longer able to calculate the median from a column when it was labelled using the Hmisc function.

Greetings, Silke

SamGG commented 3 years ago

Hi, From a basic R point of view, if the "labelled" objects are the values of the columns, then this column is a factor, and IIRC no numerical computation could be done on a factor. Do you have a short and reproducible example? Best.

zachariae commented 3 years ago

Hi, thank you for the clarification. Actually the problem with calculating the median occurs not with df, but when the df has other classes as well such as tbl_df...- So you may not be the right address to report the problem?

MG, Silke

test_data <- data.frame(AGE = c(40.9, 38.6, 33.4, 25.0, 29.5, 31.5,
                                31.5, 25.9, 31.4, 27.6, 42.4, 29.8,
                                31.6, 43.4, 31.3, 35.7, 35.7, 29.4, 
                                42.4, 47.7, 40.3, 45.2, 36.1, 38.6, 48.4, 28.1,
                                41.8, 41.0, 32.4, 34.8, 28.3))

str(test_data)

test_median <- test_data %>%
  dplyr::select(AGE) %>%
  median()

test_mean <- test_data %>%
  dplyr::select(AGE) %>%
  mean()
SamGG commented 3 years ago

IMHO the error relates to the use of dplyr, not Hmisc. The select() returns a data.frame, not a vector, that's why median() is complaining. You should use summarise to apply the median

test_data %>%
  summarise(median(AGE))

Answers at https://stackoverflow.com/questions/61158926/median-in-r-needs-numeric-data

Personal point of view: I didn't move to those nice functions, because every time I tried I get such error due to that I am so familiar with the base functions.

Please close the issue unless you feel it's related to the use of a Hmisc function. Have a nice day.