harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

Add label retention for stats::relevel #37

Closed gforge closed 8 years ago

gforge commented 8 years ago

I minor suggestion is to add label retention with stats::relevel using S3. I often use it before my regressions (e.g. using the most frequent level as reference) while I set my original levels before the descriptive tables. Here's an example:

set.seed(1)
test_factor <- factor(sample(LETTERS[1:3], replace = TRUE, size = 20))
label(test_factor) <- "My test"

test_factor <- relevel(test_factor, ref = "B")
label(test_factor) == "My test"

relevel.labelled <- function(x, ...){
  lbl <- label(x)
  x <- NextMethod()
  label(x) <- lbl
  return (x)
}
label(test_factor) <- "My test"
test_factor_v2 <- relevel(test_factor, ref = "B")
label(test_factor_v2) == "My test"
identical(test_factor_v2, test_factor)

In addition I sometimes also use factor() to drop non-used levels after subsetting. A similar function could perhaps be factor.labelled. I guess this wish-list can get rather extensive but this would at least cover my label() use 90% of the time.

harrelfe commented 8 years ago

Not a bad idea. But I wonder how many other non-label-preserving functions are out there. Is it good to spend time on only one of many?

Frank

On 12/22/2015 06:52 AM, Max Gordon wrote:

I minor suggestion is to add label retention with stats::relevel using S3. I often use it before my regressions (e.g. using the most frequent level as reference) while I set my original levels before the descriptive tables. Here's an example:

set.seed(1) test_factor <- factor(sample(LETTERS[1:3],replace = TRUE,size = 20)) label(test_factor)<- "My test"

test_factor <- relevel(test_factor,ref = "B") label(test_factor)== "My test"

relevel.labelled <- function(x,...){ lbl <- label(x) x <- NextMethod() label(x)<- lbl return (x) } label(test_factor)<- "My test" test_factor <- relevel(test_factor,ref = "B") label(test_factor)== "My test"

In addition I sometimes also use factor() to drop non-used levels after subsetting. A similar function could perhaps be factor.labelled. I guess this wish-list can get rather extensive but this would at least cover my label() use 90% of the time.

— Reply to this email directly or view it on GitHub https://github.com/harrelfe/Hmisc/issues/37.


Frank E Harrell Jr Professor and Chairman School of Medicine

Department of *Biostatistics*   *Vanderbilt University*
gforge commented 8 years ago

I don't think finding every possible function is worth the time. I would rather add functions as requested and try to limit to the most popular set of packages, common functions in base & stats make though sense to look through if there are any functions where this would be useful. A quick look through the base & stats index suggests perhaps adding cut, gsub, iconv, sub & reorder.

harrelfe commented 8 years ago

I'm not sure about that strategy because those functions are not label-preserving (and even more obviously are not units-preserving).

harrelfe commented 8 years ago

P.S. The rms package automatically uses the most frequent level as the reference cell when fitting regression models.

gforge commented 8 years ago

I agree that they are on the borderline necessary and may introduce misinterpretations. In my mind I would say that relevel, reorder and factor fall into the functions that I would expect to be label-preserving. These are also those that one would use after generating table 1 and thus require active relabeling from the user.

The iconv function shouldn't alter the meaning of the variable but I wouldn't expect it to be label-preserving. One would also most likely use it during the munging-step, i.e. before labeling. I guess this is something that us outside the English speaking countries are struggling with and you have probably not found it that useful ;-)

Didn't know that rms did that automatically - makes sense and I guess if I want to do a different comparison I could always rely on the contrast function. Since the label function is part of the Hmisc and I (and probably many others) use it in non-rms context it seems reasonable to provide this functionality.

harrelfe commented 8 years ago

I added relevel.labelled which will be in the next release to CRAN. Thanks Max.