Closed ateucher closed 3 years ago
Hmmm, this is possibly a solution:
library(dplyr)
convert_units2 <- function(value, from, to) {
stopifnot(length(unique(from)) ==1)
stopifnot(length(unique(to)) ==1)
ret <- tryCatch(
units::set_units(units::set_units(value, from, mode = "standard"), to, mode = "standard"),
error = function(e) NA_real_
)
as.numeric(ret)
}
system.time(
test3 <- x %>%
group_by(from_unit, to_unit) %>%
mutate(new_val = convert_units2(val, from_unit[1], to_unit[1]))
)
#> user system elapsed
#> 0.058 0.002 0.061
all.equal(test2, test3$new_val)
#> [1] TRUE
Created on 2021-02-16 by the reprex package (v1.0.0)
set_units
provides a method for mixed_units
, so you don't need to loop over units:
from <- c("m/s", "km/h", "mg/L", "g")
to <- c("km/h", "m/s", "g/L", "kg")
x <- mixed_units(1:4, from)
set_units(x, to, mode="standard")
#> Mixed units: g/L (1), kg (1), km/h (1), m/s (1)
#> 3.6 [km/h], 0.5555556 [m/s], 0.003 [g/L], 0.004 [kg]
For reference, this would be the way to do it with mixed_units
:
x <- read.csv(csv_file, stringsAsFactors = FALSE)
x <- x[mapply(ud_are_convertible, x$to_unit, x$from_unit), ]
x$val <- with(x, mixed_units(val, from_unit))
x$new_val <- with(x, set_units(val, to_unit, mode="standard"))
I just dropped non-convertible units. Yours is a good approach too, faster in this case because many units are repeated. Thus, we could probably take this idea of grouping into the internals of set_units.mixed_units
to improve its performance.
Anyway, in answer to the original request, I don't think that exporting ud_convert
is a good idea, because it entirely defeats the purpose of the package.
This is brilliant, thanks @Enchufa2. I had missed the mixed_units
method for set_units
- and I do recognize that this request was outside the intent of the package. I think I'll go with the grouped approach.
ud_are_convertible
is also not exported... would you consider exporting that, or a variant that fits with the package philosophy?
Thanks for the quick response
I took a look at the thread referenced above and I do think your grouped approach is the best for your use case.
ud_are_convertible
is exported in the devel version.
I think so too. Thanks so much for the thorough responses.
@Enchufa2 sorry to pester you again on this. Do you have an idea of when the next version (with ud_are_convertible
exported) will be heading to CRAN?
(ready when you are, @Enchufa2 )
I would like to give another pass to the list of issues and address some of them before going for another release.
I'm wondering if it's possible to export the currently unexported function
ud_convert
?Use case: I have a package for working with water quality data - about 15 million rows. It contains columns of data of mixed units (unit stored in a column), many of which need to be converted to another unit (stored in a different column). Because they're mixed, I have to
vapply
or loop over the rows, thus callingset_units(set_units(x, unitA), unitB)
many times. Callingud_convert()
directly is much faster:A real-world example with a 2000-row sample of data:
Created on 2021-02-16 by the reprex package (v1.0.0)
I was also experimenting with
mixed_units()
, but while I could set the mixed units, I didn't see an obvious way to convert them. But it is entirely possible I'm missing something.