r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
175 stars 28 forks source link

Automatic conversion of fundamental units to derived variables #132

Open jamarav opened 6 years ago

jamarav commented 6 years ago

Good afternoon. I'm starting to use the library units (I think it's a very useful tool). I have a couple of doubts about the library. I'm programming a small script to evaluate the performance in refrigeration compressors. I have the following problem: I evaluate the compressor efficiency with the following expression:

compressor_efficiency<-(mref*Dhs)/Wcomp

Previous to this calculation I define the following variables:

mref<-set_units(vector_data_mref, 'kg/s')
Dhs<-set_units(vector_data_Dhs, 'J/(kg)')
Wcomp<-set_units(vector_data_Wcomp, 'W')

When I evaluate the compressor efficiency: compressor_efficiency<-(mref*Dhs)/Wcomp The units are:

> compressor_efficiency
0.5833333 J/s/W

This unit is dimensionless. Is there any way to indicate that internally interpret J / s as W? The result would be:

> compressor_efficiency
0.5833333 1

Another doubt would be the following: When I define a new variable, entropy: entropy<-set_units(vector_data_entropy, 'J/(kg*K)') The printed result is:

> entropy
2175.70 J/K/kg

Is there an option to print it in the following form?

entropy 2175.70 J/(kg*K)

I think that is much clearer in this way separating numerator and denominator.

Finally, it is not very important but I would like to know if there is an option to print the units with parentheses:

> Power_input
500 (W)

Thanks in advance

Enchufa2 commented 6 years ago

There are several issues here (please, open separate threads the next time).

When I evaluate the compressor efficiency [...] This unit is dimensionless.

This is an open issue (see e.g. #123). The thing is that units maintains a unit representation at R level to be able to customise formatting and so on. However, simplification is not properly implemented at this level. On the other hand, udunits has a binary representation with proper simplification, but then you have to rely on udunits also to format units, which is not the best nor flexible at all. For example, with the udunits branch:

units:::R_ut_format(units:::R_ut_parse("J/s/W"))
#> [1] "1"

but also (from #123):

units:::R_ut_format(units:::R_ut_parse("mile/gallon"))
#> [1] "425143.683171079 m⁻²"

There are ongoing efforts to move things (e.g., user-defined units) to the C part, because udunits already manages all the hard work, but there is a trade-off when we consider, as I said, flexibility to format and print units, for instance.

One solution for this would be to provide a function simplify_units that would parse the R representation into udunits, but still we have to sort out how to parse the result back into R.

For now, you could add the following after your computations:

units(compressor_efficiency) <- 1

This will convert the efficiency to unitless, or fail with an error if units were misused in previous steps.

Is there an option to print it in the following form?

entropy 2175.70 J/(kg*K)

I don't think so. But:

units_options(negative_power=TRUE)
as_units("J/K/kg")
#> 1 J*K^-1*kg^-1

Finally, it is not very important but I would like to know if there is an option to print the units with parentheses:

Power_input 500 (W)

There is another option for this (group, see ?units_options), but it is currently applied to plots only. It may be extended to general formatting. @edzer thoughts?

edzer commented 6 years ago

Units appear now more consistently as e.g. 500 [W] where you can change the [ ] with units_options(group = c("(", ")")).

I'm in favour of makeing more aggressive simplification possible, need to look into how we could do this.

edzer commented 6 years ago

This function

to_si <- function(x) { 
  u_str = as.character(units(x))
  u = units:::R_ut_parse(u_str)
  ft = units:::R_ut_format(u, ascii = TRUE)
  new = as_units(strsplit(ft, " ")[[1]][2])
  set_units(x, new, mode = "standard")
}

converts to SI units. Shall we use that in case the user actively sets option simplify to TRUE? @Enchufa2 @t-kalinowski

> to_si(set_units(1, gallon/mile))
2.352146e-06 [m^2]
> to_si(set_units(1, gallon*mile))
6.09203 [m^4]
t-kalinowski commented 6 years ago

This feels like it should be its own option. Perhaps called standardize_to_si. I can think of lots of cases where a user might want to simplify, but not convert to si.

On Jun 30, 2018, at 7:58 AM, Edzer Pebesma notifications@github.com wrote:

This function

to_si <- function(x) { u_str = as.character(units(x)) u = units:::R_ut_parse(u_str) ft = units:::R_ut_format(u, ascii = TRUE) new = as_units(strsplit(ft, " ")[[1]][2]) set_units(x, new, mode = "standard") } converts to SI units. Shall we use that in case the user actively sets option simplify to TRUE? @Enchufa2 @t-kalinowski

to_si(set_units(1, gallon/mile)) 2.352146e-06 [m^2] to_si(set_units(1, gallon*mile)) 6.09203 [m^4] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Enchufa2 commented 6 years ago

I agree with @t-kalinowski. I also think that there may be cases in which someone may want to simplify some things and not others, convert to SI some things and not others. So it's nice to have these features as options, but having them as functions would be useful too.

edzer commented 6 years ago

The thing is that we have the opportunity to use simplify = TRUE for this: with simplify = NA by default right now setting units_options(simplify = TRUE) only influences setting units to numeric:

> units_options(simplify = TRUE)
> set_units(1, mg/kg)
1e-06 [1]
> units_options(simplify = NA)
> set_units(1, mg/kg)
1 [mg/kg]
> units_options(simplify = FALSE)
> set_units(1, mg/kg)
1 [mg/kg]

Further simplification is now done always, by the package, by symbols comparison. We could branch this further and

To me, this sounds the simplest and most elegant approach.

Enchufa2 commented 6 years ago

We have:

> units_options(simplify = TRUE)
> set_units(1, "gallon*in/dgallon")
10 in
> units_options(simplify = NA)
> set_units(1, "gallon*in/dgallon")
1 gallon*in/dgallon
> units_options(simplify = FALSE)
> set_units(1, "gallon*in/dgallon")
1 gallon*in/dgallon

So you mean that the first result should be m, the second one should be in, and the third one should be the same? I'm still not convinced, because converting to SI is more than a simplification, it's, well, a conversion. It could be misleading for the user.

I'm not convinced either about simplify=NA. What does it mean? Missing simplification? Then, it should be equivalent to simplify=FALSE, so why not simplify=FALSE by default?

Another thing you could do is to export to_si and simplify and document them together. Then you can explain there that

edzer commented 6 years ago

Thanks, valid point about in and conversion to m.

units_options(simplify = FALSE) now turns all simplification off:

> units_options(simplify = FALSE)
> u
2 [m/s]
> u * 1/u
1 [m*s/m/s]

we have NA for the combination of

Maybe then add an option, say, convert_to_SI, which when TRUE takes over all symbolic stuff by converting to base SI units?

Enchufa2 commented 6 years ago

Another option is what @t-kalinowski proposed, and I think it's fine.

Regarding the name, what about convert_to_base instead? The user could potentially uninstall all SI base units and install CGS units, for example. Then the conversion would be to CGS, not SI. The documentation may reflect that, by default, this "base" is SI units.

edzer commented 6 years ago

OK, we now have

> library(units)
udunits system database from /usr/share/xml/udunits
> units_options(convert_to_base=TRUE)
> set_units(1, gallon/km)
1 [gallon/km]
> set_units(1, gallon/km) * 1 # calls .simplify_units
3.785412e-06 [m^2]

where we convert to base when we simplify. Is that the right place, or should this not happen directly in set_units?

Enchufa2 commented 6 years ago

Mmmh, if I set the global option to TRUE, I would expect that the conversion happens always, i.e.:

> library(units)
udunits system database from /usr/share/xml/udunits
> units_options(convert_to_base=TRUE)
> set_units(1, gallon/km)
3.785412e-06 [m^2]
> set_units(1, gallon/km) * 1 # calls .simplify_units
3.785412e-06 [m^2]

That's why I was stressing the need to export simplification functions, including this new to_base, because the user may want to simplify a few results while keeping the global options to FALSE.

edzer commented 6 years ago

So, I guess this issue can be closed?

Enchufa2 commented 6 years ago

What do you think about my concern? Now, if convert_to_base=TRUE, set_units(1, gallon/km) * 1 converts to base but set_units(1, gallon/km) alone does not. I would expect an automatic conversion in both cases.

edzer commented 6 years ago

OK, I'll leave this open; needs a lot more love & patience to get this convert_to_base running.

jamarav commented 7 months ago

Well, apparently I've bumped into an issue that I opened a few years. What a coincidence!

I was actually looking for the functionality of convert_base() reported by @edzer.

I think it could be very interesting. Perhaps it is not necessary to consider it as a general option, but as a simple function that we can call in case of need. In the future, if necessary, it could be included as a general option.

I don't have experience with the use of udunits from C and I suppose that as you say there is the possibility to install different base systems.

I have simply taken the function reported a few years ago by @edzer and modified it a bit.

I have conducted several tests and from what I understood, when evaluating units, we can find four typologies of string when capturing the units. For example:

"W" #[1]
"0.001 m" #[2]
"K @ 273.15" #[3]
"0.001 K @ 273150" #[4]

With this in mind the function would be:

convert_to_base <- function(x) {
  u_str = base::as.character(base::units(x))
  u = units:::R_ut_parse(u_str)
  ft = units:::R_ut_format(u, ascii = TRUE)

  ft = base::strsplit(x = ft, split = " @ ")[[1]][1]
  ft = base::strsplit(x = ft, split = " ")[[1]]
  ft = ft[length(ft)]

  new = as_units(ft)

  set_units(x, new, mode = "standard")
}

I think it could be very interesting to include it as a function available to the user. We would have a quick way to be able to convert to SI in case of need.

jamarav commented 7 months ago

Well, it seem that the function that I reported above gets some errors. For example:

convert_to_base <- function(x) {
  R_ut_parse = utils::getFromNamespace("R_ut_parse", "units")
  R_ut_format = utils::getFromNamespace("R_ut_format", "units")

  u_str = as.character(base::units(x))
  u = R_ut_parse(u_str)
  ft = R_ut_format(u, ascii = TRUE)

  ft = strsplit(x = ft, split = " @ ")[[1]][1]
  ft = strsplit(x = ft, split = " ")[[1]]
  ft = ft[length(ft)]

  new = units::as_units(ft)

  units::set_units(x, new, mode = "standard")
}

x<-set_units(25, "g/mol")

x %>% convert_to_base()

# 40 [1/kg.mol]

I suppose the solution will be simple, but I don't know the internal function that takes care of these problems. Any idea?

t-kalinowski commented 7 months ago

Perhaps something like this:

library(units)

convert_to_base <- function(x) {
  canonicalize <- function(s) {
    s |> 
      R_ut_parse() |> 
      R_ut_format(TRUE, TRUE, TRUE) |>
      gsub(" ", " * ", x = _)
  }

  u <- units(x)
  u <- sprintf(
    "( %s ) / ( %s )", 
    canonicalize(u$numerator), 
    canonicalize(u$denominator)
  )

  # message(u)
  u <- as_units(str2lang(u))
  u <- u / as.numeric(u)
  # message(class(u))
  # str(unclass(u))

  units(x) <- u
  x
}

environment(convert_to_base) <- asNamespace("units")

x <- set_units(25, "g/mol")
convert_to_base(x)
#> 0.025 [kg/mol]

set_units(25, ug/mol) |> convert_to_base()
#> 2.5e-08 [kg/mol]
set_units(25, mg/mol) |> convert_to_base()
#> 2.5e-05 [kg/mol]
set_units(25, g/mol) |> convert_to_base()
#> 0.025 [kg/mol]
set_units(25, kg/mol) |> convert_to_base()
#> 25 [kg/mol]
jamarav commented 7 months ago

Thank you very much @t-kalinowski for the suggestions. The code you reported has helped me a lot. Unfortunately, it is perhaps a bit more complicated because of the freedom on the part of the user. Your code, for example, requires imperatively that the numerator or denominator does not contain a character(0). I have been doing some tests these days and have implemented the following function, which I think covers all cases.

convert_to_base <- function(x, simplify = T, merge_num_den = F) {
  R_ut_parse <- utils::getFromNamespace("R_ut_parse", "units")
  R_ut_format <- utils::getFromNamespace("R_ut_format", "units")

  u_strBase <- function(u_str, spfy = T) {
    u_new <- u_str |>
      R_ut_parse() |>
      R_ut_format(names = F, definition = T, ascii = T)

    u_new <- strsplit(x = u_new, split = " @ ")[[1]][1]
    u_new <- strsplit(x = u_new, split = " ")[[1]]
    u_new <- u_new[length(u_new)]

    if (spfy) {
      u_new <- u_new |>
        R_ut_parse() |>
        R_ut_format(names = F, definition = F, ascii = T)
    }

    u_new <- u_new |>
      gsub(".", " ", fixed = T, x = _)

    return(u_new)
  }

  u <- base::units(x)

  u <- sapply(u, function(i) paste0(i, collapse = "*", recycle0 = T))
  u[u == ""] <- "1"

  u["numerator"] <- sprintf("(%s)", u["numerator"])
  u["denominator"] <- sprintf("(%s)", u["denominator"])

  if (merge_num_den) u <- paste(u, collapse = "/")

  u_base <- sapply(u, function(j) u_strBase(u_str = j, spfy = simplify))

  if (merge_num_den) {
    u_base <- sprintf("(%s)", u_base)
  } else {
    unitless <- (u_base == "1")

    u_base["numerator"] <- sprintf("(%s)", u_base["numerator"])
    u_base["denominator"] <- sprintf("(%s)-1", u_base["denominator"])

    u_base <- u_base[!unitless]
    u_base <- paste(u_base, collapse = " ")
  }

  units::set_units(x, u_base, mode = "standard", implicit_exponents = T)
}

The basic operation would be as follows: A unit object is sent to the function. It internally captures its units and creates a vector u that distinguishes between the numerator and denominator. Then, the u_strBase function takes care of converting to base units. During my tests, I think setting names = F in R_ut_format simplifies the output format so it can be easily reformatted later to apply unit conversion by using set_units. But, the most important thing to convert to base units is setting definition = T. Furthermore, some splits are required from the output of R_ut_format to identify the string that refers only to the units. Once the string referring to the units is captured, the reformatting is quite simple. With names = F, you only have to replace the multiplication represented by "." with a space. Then I concatenated numerator and denominator units and used set_units by setting implicit_exponents = T.

Furthermore, I implemented some other functionalities. convert_to_base includes two variable options: simplify and merge_num_den.

The option simplify enables a second call to R_ut_format. I found that by concatenating two calls to R_ut_format, we can:

Concerning the second option, merge_num_den, it allows merging numerator and denominator before calling u_strBase. This is only useful in certain cases, such as converting kJ/s to W. This is how I started describing this function, but after some testing, I found that it is much more consistent to let R_ut_format apply separate simplifications to the numerator or denominator. That is why I have left merge_num_den=F as default, as it can only be useful in some assumptions, and in others, it gives worse results. A clear example is the enthalpy (kJ/kg) where applying the simplifications without distinguishing numerator from denominator, we get "Gy" (Gray: J/kg), which in my case makes little sense when talking about enthalpies.

@edzer @Enchufa2 and @t-kalinowski , I hope this will help you implement the convert_base function and include it in future package versions. I think the function I report is working consistently, but I am open to any suggestions for improvement.

Finally, here are some tests I have carried out on this function:

u <- "kJ/kg"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 32000 [J/kg]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 32000 [Gy]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 32000 [kg*m^2/kg/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 32000 [m^2/s^2]

u <- "fahrenheit"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 273.15 [K]

u <- "celsius"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 305.15 [K]

u <- "degree_C"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 305.15 [K]

u <- "kJ/(kg*fahrenheit)"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 57600 [J/K/kg]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 57600 [m^2/K/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 57600 [kg*m^2/K/kg/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 57600 [m^2/K/s^2]

u <- "J/s"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 32 [J/s]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 32 [W]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 32 [kg*m^2/s^3]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 32 [kg*m^2/s^3]
Enchufa2 commented 7 months ago

This definitely helps. Thanks all for the discussion and prototypes. I'll try to find some time to put things together. But this will be during the next half-term, because I'm a bit overloaded now.