r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
175 stars 28 forks source link

Conversions and user-defined units #123

Closed mailund closed 5 years ago

mailund commented 6 years ago

I don't know if this is a duplicate, but I didn't find anything exactly like it, and it is something we have discussed before. In any case, it is related to #89, #85, #84, and #67 I think.

While udunits2 is terrific in that it gives us a lot of units and lets us convert between them, it doesn't always play nice with the unit arithmetic we have. In many ways, I think we could be better off by just extracting the information it has in its tables of conversions and go it alone in units. (Especially because I am still running into installation difficulties from time to time when testing packages on Travis, but it has been a little while so it might be resolved by now).

The conversion rules I implemented two years ago could almost get us there. They assumed we had a conversion constant between convertible types, and that was that. We got those constants from udunits2, true, but nothing would really prevent us from implementing this simple logic from a database, and units would have one dependency less.

What didn't work in my implementation was cases when we have affine functions for mapping one unit to another, e.g. C to F which works as v C = 9/5 v + 32 F.

I cannot think of any unit conversion that is not affine between some base units, but I stand ready to be corrected.

It isn't much of a problem to implement affine conversion in a very similar way to the linear one, but I got stomped by 0 C + 0 C = 0 C (obviously) but 32 F + 32 F = 64 F (obviously), although these quantities are the same. We agreed that it would be necessary to distinguish between "values" and "quantities", in the sense that both 0 C and 32 F refer to the freezing point of water, but 0 C, as a quantity, is zero change in temperature, while 32 F is not.

At that point, I had gotten sidetracked by another project and didn't get any further with it, and it has been hiding in my TODO list ever since.

Until now, that is. I have thought a little about it today. At first, I thought the problem was harder than it is, but now I think we should just stick with "linear" conversions in arithmetic and affine conversions otherwise.

If you add two temperatures, you must be considering them as quantities. It is not a meaningful operation to add them otherwise, as the C and F example clearly shows. And for quantities, we can always use the linear scaling.

As a mockup implementation of units, consider

## proof-of-concept units
pocu <- function(value, unit) {
  structure(value, unit = unit, class = "pocu")
}
toString.pocu <- function(x, ...) paste(x, attr(x, "unit"))
print.pocu <- function(x, ...) cat(toString(x), "\n")

and conversion functions

convert_linear <- function(value, to) {
  from <- attr(value, "unit")
  if (is.null(from)) return(pocu(value, to)) # scalar

  value <- unclass(value)
  if (from == "F" && to == "C") {
    pocu(5/9 * value, "C")
  } else if (from == "C" && to == "F") {
    pocu(9/5 * value, "F")
  } else {
    # mockup, of course
    pocu(value, to)
  }
}
convert_affine <- function(value, to) {
  from <- attr(value, "unit")
  if (is.null(from)) return(pocu(value, to)) # scalar

  value <- unclass(value)
  if (from == "F" && to == "C") {
    pocu(5/9 * (value - 32), "C")
  } else if (from == "C" && to == "F") {
    pocu(9/5 * value + 32, "F")
  } else {
    # mockup, of course, but a default
    # affine conversion would be the linear one
    pocu(value, to)
  }
}
convert <- convert_affine

I just have a mockup of C and F conversion, so it is both simpler and more complex than what we actually have implemented right now...

Explicit conversion, using convert, will be the affine version. If you explicitly convert from C to F, I will assume this is what you want.

This will work as we would expect:

> freezing_C <- pocu(0, "C")
> freezing_F <- pocu(32, "F")
> boiling_C <- pocu(100, "C")
> boiling_F <- pocu(212, "F")
> 
> convert(freezing_C, "F")
32 F 
> convert(freezing_F, "C")
0 C 
> convert(boiling_F, "C")
100 C 
> convert(boiling_C, "F")
212 F 

For arithmetic, though, we scale linearly:

Ops.pocu <- function(e1, e2) {
  e2 <- convert_linear(e2, attr(e1, "unit"))
  e1 <- unclass(e1) ; e2 <- unclass(e2)
  pocu(NextMethod(), attr(e1, "unit"))
}

With this, we get arithmetic that matches what we would expect, I think.

> half_boiling <- function(freezing, boiling)
+   (boiling - freezing) / 2
> 
> (hb_C <- half_boiling(freezing_C, boiling_C))
50 C 
> (hb_F <- half_boiling(freezing_F, boiling_F))
90 F 

> freezing_C + hb_C + hb_F
100 C 
> freezing_C + hb_C + hb_F
100 C 
> freezing_F + hb_C + hb_F
212 F 

This semantics of conversion could be implemented in the current framework with little overhead. When I simplify expressions or test for compatibility, I already use a linear scaling. I think udunit2 is only involved with explicit conversion and with getting scaling constants (but I might misremember here).

If linear scaling is what we already have, then we are most of the way to handle all conversions in units. We just need to get the affine transformations included as well. If, for all compatible units, we know the scale and the offset, then we have all that we need. In arithmetic, we only use the scale, in explicit conversions we use the offset as well. For units that are already linearly scalable, the offset will just be zero.

We don't need to build this table right away. As a first step, we can still use udunits2. We get the offset by converting zero and the scale we can get from the offset and converting one.

Does this sound reasonble, and is it something you would like me to have a look at?

Cheers

edzer commented 6 years ago

I don't think that units will drift away from udunits2 (C library), but rather seek stronger ties. udunits2 has been developed by experts over 20 years, and is used by a much larger community than the R community (just look at the issues). We (@mailund , @t-kalinowski @Enchufa2 and I) are all doing this as a side project, without having the resources to create and maintain a bullet proof product from scratch, and without the knowledge required for this (that went into udunits2).

udunits2 lets you define constant and offset. With the udunits branch of this repo (which integrates the R package udunits2):

library(units) # branch udunits
# udunits system database from /usr/share/xml/udunits

# Attaching package: ‘units’

# The following object is masked from ‘package:base’:

#     %*%

freezing_C <- set_units(0, degree_C)
freezing_F <- set_units(32, degree_F)
boiling_C <-  set_units(100, degree_C)
boiling_F <-  set_units(212, degree_F)

set_units(freezing_C, degree_F)
# 32 degree_F
set_units(freezing_F, degree_C)
# 3.552714e-14 °C
set_units(boiling_F,  degree_C)
# 100 °C
set_units(boiling_C,  degree_F)
# 212 degree_F

# now self-constructed my_C, using udunits C API

# v C = 9/5 v + 32 F:
foo <- install_conversion_offset("degree_F", "__C", 32)
foo <- install_conversion_constant("__C", "my_C", 5/9) # interprets constant 1/c

set_units(freezing_F, my_C)
# -2.131628e-14 my_C
set_units(boiling_F,  my_C)
# 100 my_C
Enchufa2 commented 6 years ago

I completely agree. Summing up, units is currently limited by the udunits2 package. The latter exposes a very limited functionality from UDUNITS, but UDUNITS itself provides (AFAIK) everything units needs (it is questionable though that its API usage design is performant enough for some applications, but that's another battle). So the point is whether udunits2 can grow as fast as units needs. If not, including it into units is the logical step forward.

mailund commented 6 years ago

If we can do everything we want with udunits then I think it makes the most sense to do so. It would still have to deal with the difference between adding quantities and translation between absolute values, though, wouldn't it? Is there some way of specifying such operations in udunits?

If udunits can do units simplification, and you can define your own units, then we could get rid of all the unit handling code altogether. We could just call into udunits. That would be a cleaner approach than the one we have now with some units handling in R and some in C.

edzer commented 6 years ago

That is an attractive idea; however, udunits2 represents quantities (AFAICT!) as numbers and the powers for each base unit, so for representing 1 mile/gallon it would create a new number, and retain -2 for m. I think that would scare users off! We still have the issue that 1 mg/kg is not retained as such: it now errors, before it would become 1e-6 1. It's the trade-off between usability and being principled.

With the udunits branch:

> units:::R_ut_format(units:::R_ut_parse("mile/gallon"))
[1] "425143.683171079 m⁻²"
billdenney commented 6 years ago

Could a new function like show_unit_as provide the link between principle and usability:

show_unit_as("m^-2", "mile/gallon")

Then whenever units was going to print m^-2 it would find that, do a conversion to "mile/gallon", and print the result. Ideally, and similar to #134, show_unit_as would apply to a specific unit system. That way, the user could mix-and-match assignment of measurements:

show_unit_as("m^-2", "mile/gallon", system="fuel efficiency")
show_unit_as("1", "mg/kg", system="drug dosing")

It would add potentially significant overhead to printing, but it seems to be a straight-forward implementation of the goal to show the user what they expect.

Enchufa2 commented 6 years ago

udunits branch merged into master and submitted to CRAN as v0.6-0. I think we can close this too unless @edzer wants to keep in mind any of @mailund's comments above.