r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
175 stars 28 forks source link

aggregate on units results in "unitless" object #216

Closed MartinStjernman closed 4 years ago

MartinStjernman commented 4 years ago

Hi all! I have stumbled on a problem: When trying to aggregate a units variable in a data.frame (or the like such as sf (simple feature)-objects; class 'sf' 'data.frame') using aggregate() the result is still a units variable (which is fine) but with "empty" unit ([]). And trying to set the unit on that variable result in an error (f ex Error: cannot covert into m^2). Here is a reproducible example: temp <- data.frame(group = factor(rep(letters[1:2], each=5)), measure = set_units(rnorm(10), m^2)) temp

group measure 1 a 1.7850549 [m^2] 2 a -1.2235032 [m^2] 3 a -1.1449203 [m^2] 4 a -0.5173411 [m^2] 5 a -0.2857274 [m^2] 6 b 1.4154283 [m^2] 7 b -1.4539021 [m^2] 8 b -0.6898131 [m^2] 9 b -1.2837045 [m^2] 10 b -0.8362810 [m^2]

temp2 <- aggregate(measure~group, data = temp, sum) temp2

group measure 1 a -1.386437 [] 2 b -2.848272 []

As can be seen temp2$measure has no unit but is still a units object. And trying to set unit does not work: units(temp2$measure) <- with(ud_units, m^2)

Error: cannot convert into m^2

Is this expected behaviour or am I doing something wrong? It is a problem in my work since when working with simple features using the sf package and calculating area of polygons (st_area(sf-object)) the areas returned are units objects and when later wanting to do aggregation based on an attribute variable the summarized area has no unit and I cannot set it either. As my reproducible example shows this appears not to be a problem only within the sf package but something to do with the units and how they are handled by aggregate()

Any help is highly appreciated!

Thanks!

My sessionInfo(): sessionInfo()

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] readxl_1.3.1 pool_0.1.4.2 DBI_1.0.0 dplyr_0.8.1 units_0.6-4 lwgeom_0.1-7 sf_0.8-0
loaded via a namespace (and not attached): [1] rstan_2.19.2 tidyselect_0.2.5 xfun_0.7 purrr_0.3.2 colorspace_1.4-1 vctrs_0.2.0
[7] stats4_3.6.1 loo_2.1.0 utf8_1.1.4 blob_1.2.0 rlang_0.4.0 pkgbuild_1.0.3
[13] e1071_1.7-2 later_0.8.0 pillar_1.4.2 glue_1.3.1 bit64_0.9-7 dbplyr_1.4.2
[19] matrixStats_0.54.0 stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 inline_0.3.15
[25] knitr_1.23 callr_3.2.0 ps_1.3.0 parallel_3.6.1 class_7.3-15 fansi_0.4.0
[31] Rcpp_1.0.1 KernSmooth_2.23-15 backports_1.1.4 scales_1.0.0 classInt_0.4-1 StanHeaders_2.18.1-10 [37] bit_1.1-14 gridExtra_2.3 ggplot2_3.2.0 hms_0.5.1 packrat_0.5.0 stringi_1.4.3
[43] processx_3.3.1 grid_3.6.1 cli_1.1.0 odbc_1.1.6 tools_3.6.1 magrittr_1.5
[49] lazyeval_0.2.2 tibble_2.1.3 crayon_1.3.4 pkgconfig_2.0.2 zeallot_0.1.0 prettyunits_1.0.2
[55] assertthat_0.2.1 rstudioapi_0.10 R6_2.4.0 compiler_3.6.1

Enchufa2 commented 4 years ago

This is expected. Unfortunately, many base R functions are not very welcoming to custom classes and attributes, and such behaviour is hard to overcome. There's a discussion about this in this vignette.

TL;DR, if you are lucky (and you are in this case), there some simplify argument that you can set to FALSE to preserve units. The downside is that the result is provided in a list, and you may want to unlist that. But of course, unlist will strip down units too (in fact, unlist is used internally with simplify=TRUE), so you may want to unlist them carefully. In the linked vignette, we provide a function to do that in a general way (I hope), but is not very well tested.

If you find a better way, please let us know!

MartinStjernman commented 4 years ago

Thanks @Enchufa2 for this explanation.

I could use the workaround in more general cases but in specific situations where I have control over the calculations such that I know the resulting units after the aggregation I could of course "reset" the unit afterwards.

My example above, however, show that I cannot use:

`units(temp2$measure) <- with(ud_units, m^2)´

Error: cannot convert into m^2

Nor can I use:

temp2$measure <- set_units(temp2$measure, m^2)

Error: cannot convert into m^2

Apparently (and in line with the vignette you linked to), though still of class units: class(temp2$measure)

[1] "units"

temp2$measuredoes not have any units attributes that can be set using the above.

units(temp2$measure)

NULL

However, this seems to work: temp2$measure <- temp2$measure*with(ud_units, m^2) temp2

group measure 1 a -1.386437 [m^2] 2 b -2.848272 [m^2]

units(temp2$measure)

$numerator [1] "m" "m" $denominator character(0) attr(,"class") [1] "symbolic_units"

I conclude that I have to settle with this for the time being. Hopefully future developments will allow base R manipulations on quantity/units objects with preserved attributes and metadata. I personally does not have any obvious solution in mind but will let you know if I come up with one.

Thanks again for the help, I now close this issue