r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 297 forks source link

st_write and blob data #1721

Closed obrl-soil closed 3 years ago

obrl-soil commented 3 years ago

The documentation for st_write() together with dbDataType() suggests that it should be possible to write "lists of raw vectors, and blob::blob objects" to GeoPackage (or to any db supporting a blob column type). However, clean_columns() currently prevents this from happening. A quick reprex (Win10, R 4.1.0):

library(sf) # 1.0-1
library(dplyr)
library(tibble)

test <- tribble(~h3_address, ~h3_res, ~geom,
                '8abe8d12acaffff', 10, st_point(c(1,2)),
                '8bbe8d12acadfff', 11, st_point(c(3,4)),
                '8cbe8d12acad3ff', 12, st_point(c(5,6))) |>
  st_as_sf() |> 
  # could also use blob::as_blob here:
  mutate(h3_bin = lapply(h3_address, function(i) { as.vector(sapply(i, charToRaw)) }))

dest <- tempfile(fileext = '.gpkg')
st_write(test, dest)
> Warning message:
> In clean_columns(as.data.frame(obj), factorsAsCharacter) :
>   Dropping column(s) h3_bin of class(es) list

Using a field.type specification doesn't appear to be respected, either (unless I'm invoking it wrong?):

library(RSQLite)
test2 <- tribble(~h3_address, ~h3_res, ~geom,
                 '8abe8d12acaffff', 10, st_point(c(1,2)),
                 '8bbe8d12acadfff', 11, st_point(c(3,4)),
                 '8cbe8d12acad3ff', 12, st_point(c(5,6))) |>
  st_as_sf()

dest2 <- tempfile(fileext = '.gpkg')
st_write(test2, dest2, field.types = c('h3_address' = 'blob'))

# check output field types
db <- dbConnect(RSQLite::SQLite(), dest2)
tname <- dbListTables(db)[1]
rs <- dbSendQuery(db, paste0('select * from ', tname))
dbColumnInfo(rs) # still char data

My current intended use case is storing photographs in a point geopackage. I've found some other workarounds to do that with in R, but it requires a fairly complicated workflow involving RSQLite and the spatialite extension. Being able to use st_write() would be much cleaner. Is it possible to implement this functionality?

edzer commented 3 years ago

Calling st_write with a character file name destination takes the GDAL route, and I don't think that that route allows for writing out binary data, or list columns with raw, blobs and so on. @mdsumner can you confirm this? What might work is to use write methods that work directly on database connections, @etiennebr do you think that might work? It also calls clean_columns ATM, but we could lift restrictions here if this would work.

etiennebr commented 3 years ago

Interesting! I would have to look a bit more in-depth. Meanwhile, If you feel comfortable, you could try it with dbWriteTable which is slightly lower-level.

mdsumner commented 3 years ago

it should be possible, because OFTBinary and apparently SQLite supports generally - I'm happy to look at this, first hack a geopackage table to include a binary column as @etiennebr suggested ... see below for this anyone who wants to try

library(sf) # 1.0-1
library(dplyr)
library(tibble)

test <- tribble(~h3_address, ~h3_res, ~geom,
                '8abe8d12acaffff', 10, st_point(c(1,2)),
                '8bbe8d12acadfff', 11, st_point(c(3,4)),
                '8cbe8d12acad3ff', 12, st_point(c(5,6))) |>
  st_as_sf() |> 
  # could also use blob::as_blob here:
  mutate(h3_bin = lapply(h3_address, function(i) { as.vector(sapply(i, charToRaw)) }))

dest <- tempfile(fileext = '.gpkg')
st_write(test, dest)

## hack Geopackage in place
library(dbplyr)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), dest)
tabname <- dbListTables(con)[1] ## basically the file name sans ext
d <- tbl(con, tabname) %>% collect()
d$h3_bin <- test$h3_bin
dbWriteTable(con, tabname, d, overwrite = T)
dbDisconnect(con)

## reads as character
sf::read_sf(dest)

## doesn't get anything ...  oops
vapour::vapour_read_attributes(dest)

system("gdalinfo --version")
GDAL 3.2.1, released 2020/12/29

system(sprintf("ogrinfo %s %s", dest, tabname))
INFO: Open of `...file2e6ea87646a91c.gpkg'
      using driver `GPKG' successful.

Layer name: file2e6ea87646a91c
Geometry: Point
Feature Count: 3
Extent: (1.000000, 2.000000) - (5.000000, 6.000000)
Layer SRS WKT:
  (unknown)
Geometry Column = geom
fid: Integer64 (0.0)
h3_address: String (0.0)
h3_res: Real (0.0)
h3_bin: Binary (0.0)
OGRFeature(file2e6ea87646a91c):1
fid (Integer64) = 1
h3_address (String) = 8abe8d12acaffff
h3_res (Real) = 10
h3_bin (Binary) = 386162653864313261636166666666
POINT (1 2)

OGRFeature(file2e6ea87646a91c):2
fid (Integer64) = 2
h3_address (String) = 8bbe8d12acadfff
h3_res (Real) = 11
h3_bin (Binary) = 386262653864313261636164666666
POINT (3 4)

OGRFeature(file2e6ea87646a91c):3
fid (Integer64) = 3
h3_address (String) = 8cbe8d12acad3ff
h3_res (Real) = 12
h3_bin (Binary) = 386362653864313261636164336666
POINT (5 6)
obrl-soil commented 3 years ago

Hmm, ok, I think you only got away with that because d isn't an sf object - the geom column was read in as a list of raws just like h3_bin. For context my current workflow is:

})


The loop is safer for memory management, but I'd like to replace those UPDATE statements with `st_write()` with `append = TRUE` so I don't have to depend on spatialite.
mdsumner commented 3 years ago

I'm just spitballing that it's possible, it's a field type support in sf cpp, but implications for gdal version and driver capability etc

edzer commented 3 years ago

Thanks @mdsumner - this is currently in a branch OFTBinary, for you to test; st_write should now accept list columns with raw vectors.

edzer commented 3 years ago

This should also work with blob objects, as blobs are lists with raw vectors.

obrl-soil commented 3 years ago

ok, this works perfectly on write:

outfile <- file.path(getwd(), 'photos.gpkg')

# `photo_split` is the sf containing metadata, split by row
purrr::map(photo_split, function(i) {

  img <- magick::image_read(file.path(getwd(), 'photos', i[['txt_photo_filename']][1])) 
  info <- magick::image_info(img)

  img_r <- if(info$width[1] > info$height[1]) {
    magick::image_resize(img, geometry = geometry_size_pixels(width = 1024))
  } else {
    magick::image_resize(img, geometry = geometry_size_pixels(height = 1024))  
  }

  temp_dest <- tempfile(i[['txt_photo_filename']][1], fileext = '.jpg')
  image_write(img_r, temp_dest)
  i[['jpg_image']] <- blob(readBin(temp_dest, what="raw", n=1e6))  # can use list or blob here, results are consistent

   if(!file.exists(outfile)) {
     st_write(i, outfile, layer = 'photos', quiet = TRUE, append = FALSE)
     } else {
       st_write(i, outfile, layer = 'photos', quiet = TRUE, append = TRUE)
     }

})

Very slightly slower than my initial approach but that's fine.

On reading the output back in with st_read() though, the blobs are still interpreted as a character field, and I'm not sure what the best option is to coerce them to the same list-column of raws/blobs that was output.

Seems like it might be good to allow a column spec to be declared on read (like readr::read_csv()), could be safer to push this decision back on the user?

edzer commented 3 years ago

I now merged this to the master branch, messed up my branches once more...

edzer commented 3 years ago

Ready to close?

obrl-soil commented 3 years ago

Oh yes, sorry - all appears to work now. Thanks!