njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

implementing parquet filetype? #36

Open njtierney opened 3 months ago

njtierney commented 3 months ago

As mentioned in #4, e.g.

tar_sf_vector(filetype="parquet")
brownag commented 3 months ago

So far, the following works for terra SpatVector objects via the GDAL (Geo)Parquet driver:

library(targets)

tar_script({
    list(
        geotargets::tar_terra_vect(test_terra_parquet,
                                   terra::vect(system.file("ex", "lux.shp", package = "terra")),
                                   filetype = "Parquet")
    )
})

tar_make()
#> Loading required namespace: terra
#> ▶ dispatched target test_terra_parquet
#> ● completed target test_terra_parquet [0.012 seconds]
#> ▶ ended pipeline [0.095 seconds]
x <- tar_read(test_terra_parquet)
x
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  source      : test_terra_parquet
#>  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <num>    <chr> <num>    <chr> <num> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664

terra::describe(tar_path_target(test_terra_parquet))
#> [1] "Driver: Parquet/(Geo)Parquet"              
#> [2] "Files: _targets/objects/test_terra_parquet"
#> [3] "Size is 512, 512"                          
#> [4] "Corner Coordinates:"                       
#> [5] "Upper Left  (    0.0,    0.0)"             
#> [6] "Lower Left  (    0.0,  512.0)"             
#> [7] "Upper Right (  512.0,    0.0)"             
#> [8] "Lower Right (  512.0,  512.0)"             
#> [9] "Center      (  256.0,  256.0)"

Still need to implement analogous methods for {sf} objects via #13.

Also, we may want to implement a variant that uses write methods via {arrow} RE: #2 as this may be more efficient for larger targets? Would be interesting to benchmark GDAL vs. Arrow

Aariq commented 3 months ago

Would be interesting to benchmark GDAL vs. Arrow

I think benchmarking is definitely part of the plan once things are somewhat stable. Would be good to give users an idea of the tradeoffs in speed, size, and dependency requirements.