Closed bevingtona closed 3 years ago
Do you have a toy example of a real workflow with sf
?
Part of the problem is the try
call for character parsing:
library(units)
#> udunits system database from /usr/share/udunits
microbenchmark::microbenchmark(
5000 /(1000*1000),
set_units(1, "m"),
set_units(1, m)
)
#> Unit: nanoseconds
#> expr min lq mean median uq max neval
#> 5000/(1000 * 1000) 156 252.5 556.56 564.5 617.5 3697 100
#> set_units(1, "m") 338754 369774.0 400744.32 386060.5 422012.0 836869 100
#> set_units(1, m) 150271 164168.5 190829.26 178997.5 193420.5 783045 100
#> cld
#> a
#> c
#> b
microbenchmark::microbenchmark(
5000 /(1000*1000),
set_units(as_units("m"), "km"),
set_units(set_units(1, m), km)
)
#> Unit: nanoseconds
#> expr min lq mean median uq
#> 5000/(1000 * 1000) 139 576.0 1021.26 698.5 813.5
#> set_units(as_units("m"), "km") 975142 1036543.0 1170724.34 1134428.0 1184959.0
#> set_units(set_units(1, m), km) 642961 742960.5 800879.32 767404.0 806938.5
#> max neval cld
#> 15689 100 a
#> 2537073 100 c
#> 1832808 100 b
Please come with a use case where the time difference is meaningful: what will you do with those nanoseconds? Typically you operate on vectors of units, like
library(units)
# udunits system database from /usr/share/xml/udunits
microbenchmark::microbenchmark((1:1000000)/(1000*1000),
set_units(as_units(1:1000000,"m2"),"km2"))
# Unit: milliseconds
# expr min lq mean median
# (1:1e+06)/(1000 * 1000) 1.680127 2.810617 4.73138 5.241931
# set_units(as_units(1:1e+06, "m2"), "km2") 7.370606 8.817785 11.76610 12.162606
# uq max neval
# 5.584564 9.74689 100
# 12.914351 31.42322 100
There is still a difference, but when will this be the bottleneck in your analysis? And then there's always drop_units
.
Thanks to both of you :)
The use case for me is that I have 1.5 million polygons and I would like to calculate the area of each, and the area of the polygon with a 30 metre and a -30 metre buffer, and then convert the each of the 3 areas from m^2 to km^2. I am just looking at ways to speed it up the script.
Thanks for your insights
Here is an example using sf
in milliseconds ;)
library(units)
#> udunits system database from E:/Dropbox/R/R-3.6.3/library/units/share/udunits
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
file_name <- system.file("shape/nc.shp", package="sf")
(nc <- st_read(file_name) %>% select(geometry) %>%
mutate(area_m2_with_units = st_area(.),
area_m2_no_units = st_area(.) %>% drop_units()))
#> Reading layer `nc' from data source `E:\Dropbox\R\R-3.6.3\library\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> geographic CRS: NAD27
#> Simple feature collection with 100 features and 2 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> geographic CRS: NAD27
#> First 10 features:
#> geometry area_m2_with_units area_m2_no_units
#> 1 MULTIPOLYGON (((-81.47276 3... 1137388604 [m^2] 1137388604
#> 2 MULTIPOLYGON (((-81.23989 3... 611077263 [m^2] 611077263
#> 3 MULTIPOLYGON (((-80.45634 3... 1423489919 [m^2] 1423489919
#> 4 MULTIPOLYGON (((-76.00897 3... 694546292 [m^2] 694546292
#> 5 MULTIPOLYGON (((-77.21767 3... 1520740530 [m^2] 1520740530
#> 6 MULTIPOLYGON (((-76.74506 3... 967727952 [m^2] 967727952
#> 7 MULTIPOLYGON (((-76.00897 3... 615942210 [m^2] 615942210
#> 8 MULTIPOLYGON (((-76.56251 3... 903650119 [m^2] 903650119
#> 9 MULTIPOLYGON (((-78.30876 3... 1179347051 [m^2] 1179347051
#> 10 MULTIPOLYGON (((-80.02567 3... 1232769242 [m^2] 1232769242
microbenchmark::microbenchmark(
nc %>% mutate(area_km2_with_units = set_units(area_m2_with_units, "km2")),
nc %>% mutate(area_km2_no_units = area_m2_no_units/(1000*1000)))
#> Unit: milliseconds
#> expr
#> nc %>% mutate(area_km2_with_units = set_units(area_m2_with_units, "km2"))
#> nc %>% mutate(area_km2_no_units = area_m2_no_units/(1000 * 1000))
#> min lq mean median uq max neval cld
#> 3.0232 3.10985 3.351746 3.2059 3.34610 6.2752 100 b
#> 1.4575 1.51945 1.633360 1.5740 1.64825 4.0995 100 a
Created on 2020-09-15 by the reprex package (v0.3.0)
I would be surprised if the unit conversion is not orders of magnitude cheaper than buffer and area calculation. Let me know if it is so significant. And if you have many unit values, put them in vectors before you convert them.
thanks! I'll close the issue.
Really love this package but did not realize the bottleneck it creates in some my
sf
workflows. Converting an area is >4000 times slower than baseR
. Any suggestions to improve the performance? Seereprex
of conversion from 5000 m^2 to km^2 below: Thanks in advance!Created on 2020-09-15 by the reprex package (v0.3.0)