r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
175 stars 28 forks source link

Suggestion for new function: keep_units() #252

Closed vorpalvorpal closed 3 years ago

vorpalvorpal commented 3 years ago

There are quite a few functions out there that don't cope well with units objects. I find myself not infrequently writing: x <- set_units(2, mg); set_units(some_function(drop_units(x)), mg) or, even worse, when I don't know what the units will be: x_units <- units(x); set_units(some_function(drop_units(x)), x_units, mode = "standard")

I wonder if a convenience function along the lines of the following would be useful:

keep_units <- function(FUN, args, arg_unit = NULL){
  fix_units <- function(arg){
    arg_unit_i <- units(arg)
    if(is.null(arg_unit)){
      arg_unit <<- arg_unit_i
    }
    if(!identical(arg_unit, arg_unit_i)){
      arg <- set_units(arg, arg_unit, mode = "standard")
    }
    arg <- drop_units(arg)
    return(arg)
  }
  args <- rapply(args, fix_units, classes = "units", how = "replace")
  do.call(FUN, args) %>%
    set_units(arg_unit, mode = "standard") %>%
    return()
}

which would allow keep_units(some_function, list(x)) which is quite a bit shorter and, I think, easier to read.

Enchufa2 commented 3 years ago

Could you provide a couple of examples of such functions, please?

Enchufa2 commented 3 years ago

Your approach seems overly complicated. Something like the following would be sufficient:

keep_units <- function(FUN, x, ..., unit=units(x)) {
  set_units(do.call(FUN, list(x, ...)), unit, mode="standard")
}

unless I'm missing something, and that's why asked for some examples of functions that drop units. And anyway, it is always better to provide methods for such functions whenever possible.

vorpalvorpal commented 3 years ago

The most recent one I came across was smwrBase::fillMissing, but I have had the same problem with other functions.

The idea of my function was that some functions might take multiple different unit objects as arguments. Thus:

a <- set_units(1, g)
b <- set_units(1, mg)
# the following works correctly
a + b
# so this should work as well
keep_units(FUN, list(a, b))
Enchufa2 commented 3 years ago

Addition works correctly because there are rules encoded for that in the corresponding method. But how could an arbitrary function possibly work with multiple units? What should keep_units do with them?

vorpalvorpal commented 3 years ago

What I find really useful about the units package is that I can take data from different sources and work with them without having to convert them into common units. So I might receive some data in mg/l and other data in g/l. I can keep these in their original unit rather than decide on a common storage unit. My thought process for keep_units() was that it would be for functions that don't work with units objects for whatever reason AND expect all implicit unit arguments to be in the same units. The keep_units function would do any necessary conversion of arguments to ensure the arguments the function sees are equivalent. If you give it one arg in kg and one in km it will return an error whereas if you give it one arg in g and another in mg it will convert mg to g before dropping the units and running the function.

Now clearly this won't work with all functions, but I think it is the subset of functions where keep_units() makes any sense.

Enchufa2 commented 3 years ago

Sorry, but I'm still not convinced. Addition doesn't work for different units, but e.g. multiplication does. There is no a priori reason for a helper like that to assume that a given input function should give an error for different units.

To me, there are two distinct functionalities in what you are describing. One is to apply a function that strips down attributes and then restore the unit at the output, and that's pretty much solved by the one-liner function I suggested in my second comment. The other is to homogenize a set of objects with the same magnitude but different units, which can be done as follows:

x <- set_units(100, m)
y <- set_units(100, foot)
z <- set_units(100, mile)

for (i in c("y", "z"))
  assign(i, set_units(get(i), units(x), mode="standard"))

I don't see any reason to merge both functionalities, because 1) it's confusing, and 2) enables a specific use case, but disables others that are equally valid.

Enchufa2 commented 3 years ago

A possible implementation:

library(units)
#> udunits system database from /usr/share/udunits

homogenize_units <- function(x, ..., unit=units(x), envir=parent.frame()) {
  nm <- sapply(substitute(...()), as.character)
  for (i in c("x", nm)) {
    value <- get(i, envir=envir)
    assign(i, set_units(value, unit, mode="standard"), envir=envir)
  }
}

x <- set_units(100, m)
y <- set_units(100, foot)
z <- set_units(100, mile)
df <- data.frame(x, y, z)

# for variables in the environment
homogenize_units(x, y, z)
x; y; z
#> 100 [m]
#> 30.48 [m]
#> 160934.4 [m]

# for columns of a dataframe
new_df <- within(df, homogenize_units(x, y, z))
df; new_df
#>         x          y          z
#> 1 100 [m] 100 [foot] 100 [mile]
#>         x         y            z
#> 1 100 [m] 30.48 [m] 160934.4 [m]

@edzer What do you think?

Enchufa2 commented 3 years ago

@edzer Did you have time to take a look at this?

edzer commented 3 years ago

Thanks for reminding. Yes, you can do this but TBH I really don't like functions that write to .GlobalEnv rather than returning a value.

Enchufa2 commented 3 years ago

Me neither. We could keep it simpler:

library(units)
#> udunits system database from /usr/share/udunits

homogenize_units <- function(x, ref) {
  set_units(x, units(ref), mode="standard")
}

x <- set_units(100, m)
y <- set_units(100, foot)
z <- set_units(100, mile)

y <- homogenize_units(y, x)
z <- homogenize_units(z, x)
x; y; z
#> 100 [m]
#> 30.48 [m]
#> 160934.4 [m]

There's not much gain compared to current functionality, but maybe a helper like this is more user-friendly? Another thing is the verb. Maybe "homogenize" is too convoluted? Alternatives? Adapt, take?