Closed obrl-soil closed 3 years ago
Calling st_write
with a character
file name destination takes the GDAL route, and I don't think that that route allows for writing out binary data, or list columns with raw, blobs and so on. @mdsumner can you confirm this? What might work is to use write methods that work directly on database connections, @etiennebr do you think that might work? It also calls clean_columns
ATM, but we could lift restrictions here if this would work.
Interesting! I would have to look a bit more in-depth. Meanwhile, If you feel comfortable, you could try it with dbWriteTable which is slightly lower-level.
it should be possible, because OFTBinary and apparently SQLite supports generally - I'm happy to look at this, first hack a geopackage table to include a binary column as @etiennebr suggested ... see below for this anyone who wants to try
library(sf) # 1.0-1
library(dplyr)
library(tibble)
test <- tribble(~h3_address, ~h3_res, ~geom,
'8abe8d12acaffff', 10, st_point(c(1,2)),
'8bbe8d12acadfff', 11, st_point(c(3,4)),
'8cbe8d12acad3ff', 12, st_point(c(5,6))) |>
st_as_sf() |>
# could also use blob::as_blob here:
mutate(h3_bin = lapply(h3_address, function(i) { as.vector(sapply(i, charToRaw)) }))
dest <- tempfile(fileext = '.gpkg')
st_write(test, dest)
## hack Geopackage in place
library(dbplyr)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), dest)
tabname <- dbListTables(con)[1] ## basically the file name sans ext
d <- tbl(con, tabname) %>% collect()
d$h3_bin <- test$h3_bin
dbWriteTable(con, tabname, d, overwrite = T)
dbDisconnect(con)
## reads as character
sf::read_sf(dest)
## doesn't get anything ... oops
vapour::vapour_read_attributes(dest)
system("gdalinfo --version")
GDAL 3.2.1, released 2020/12/29
system(sprintf("ogrinfo %s %s", dest, tabname))
INFO: Open of `...file2e6ea87646a91c.gpkg'
using driver `GPKG' successful.
Layer name: file2e6ea87646a91c
Geometry: Point
Feature Count: 3
Extent: (1.000000, 2.000000) - (5.000000, 6.000000)
Layer SRS WKT:
(unknown)
Geometry Column = geom
fid: Integer64 (0.0)
h3_address: String (0.0)
h3_res: Real (0.0)
h3_bin: Binary (0.0)
OGRFeature(file2e6ea87646a91c):1
fid (Integer64) = 1
h3_address (String) = 8abe8d12acaffff
h3_res (Real) = 10
h3_bin (Binary) = 386162653864313261636166666666
POINT (1 2)
OGRFeature(file2e6ea87646a91c):2
fid (Integer64) = 2
h3_address (String) = 8bbe8d12acadfff
h3_res (Real) = 11
h3_bin (Binary) = 386262653864313261636164666666
POINT (3 4)
OGRFeature(file2e6ea87646a91c):3
fid (Integer64) = 3
h3_address (String) = 8cbe8d12acad3ff
h3_res (Real) = 12
h3_bin (Binary) = 386362653864313261636164336666
POINT (5 6)
Hmm, ok, I think you only got away with that because d
isn't an sf
object - the geom column was read in as a list of raws just like h3_bin
. For context my current workflow is:
list.files
on a folder of images named with the convention "PROJECT_SITE_XCOORD_YCOORD_SUBJECT"dbConnect()
and run dbExecute(con, "ALTER table ADD COLUMN jpg_image blob")
RSQLite
while keeping sf
happy require the spatialite extension, which is not bundled. I have it elsewhere so I use sys.setenv()
to add it to my PATH in session, and then activate it with dbSendQuery(db, "SELECT load_extension('mod_spatialite.dll');")
After that I loop through the folder of images, shrinking each to a portable size with magick
and adding to the GPKG. That's the messy part:
# photo_names is a vector of file names sans extension
purrr::map(photo_names, function(i) {
img <- magick::image_read(file.path(getwd(), 'photos', paste0(i, '.jpg')))
info <- magick::image_info(img)
# this makes sure the photos are always max 1024px whether they're portrait
# or landscape
img_r <- if(info$width[1] > info$height[1]) {
magick::image_resize(img, geometry = geometry_size_pixels(width = 1024))
} else {
magick::image_resize(img, geometry = geometry_size_pixels(height = 1024))
}
# img_r is an external pointer holding the jpeg in memory so there's this song
# and dance, but that's a separate problem
temp_dest <- tempfile(i, fileext = '.jpg')
image_write(img_r, temp_dest)
img_bin <- paste0(readBin(temp_dest, what="raw", n=1e6), collapse="")
dbExecute(db, paste0(
"UPDATE photos
SET jpg_image = x'", img_bin ,"'
WHERE txt_photo_filename = '", i, ".jpg'"))
})
The loop is safer for memory management, but I'd like to replace those UPDATE statements with `st_write()` with `append = TRUE` so I don't have to depend on spatialite.
I'm just spitballing that it's possible, it's a field type support in sf cpp, but implications for gdal version and driver capability etc
Thanks @mdsumner - this is currently in a branch OFTBinary
, for you to test; st_write
should now accept list columns with raw vectors.
This should also work with blob
objects, as blob
s are lists with raw vectors.
ok, this works perfectly on write:
outfile <- file.path(getwd(), 'photos.gpkg')
# `photo_split` is the sf containing metadata, split by row
purrr::map(photo_split, function(i) {
img <- magick::image_read(file.path(getwd(), 'photos', i[['txt_photo_filename']][1]))
info <- magick::image_info(img)
img_r <- if(info$width[1] > info$height[1]) {
magick::image_resize(img, geometry = geometry_size_pixels(width = 1024))
} else {
magick::image_resize(img, geometry = geometry_size_pixels(height = 1024))
}
temp_dest <- tempfile(i[['txt_photo_filename']][1], fileext = '.jpg')
image_write(img_r, temp_dest)
i[['jpg_image']] <- blob(readBin(temp_dest, what="raw", n=1e6)) # can use list or blob here, results are consistent
if(!file.exists(outfile)) {
st_write(i, outfile, layer = 'photos', quiet = TRUE, append = FALSE)
} else {
st_write(i, outfile, layer = 'photos', quiet = TRUE, append = TRUE)
}
})
Very slightly slower than my initial approach but that's fine.
On reading the output back in with st_read()
though, the blobs are still interpreted as a character field, and I'm not sure what the best option is to coerce them to the same list-column of raws/blobs that was output.
Seems like it might be good to allow a column spec to be declared on read (like readr::read_csv()
), could be safer to push this decision back on the user?
I now merged this to the master branch, messed up my branches once more...
Ready to close?
Oh yes, sorry - all appears to work now. Thanks!
The documentation for
st_write()
together withdbDataType()
suggests that it should be possible to write "lists of raw vectors, and blob::blob objects" to GeoPackage (or to any db supporting a blob column type). However,clean_columns()
currently prevents this from happening. A quick reprex (Win10, R 4.1.0):Using a
field.type
specification doesn't appear to be respected, either (unless I'm invoking it wrong?):My current intended use case is storing photographs in a point geopackage. I've found some other workarounds to do that with in R, but it requires a fairly complicated workflow involving RSQLite and the spatialite extension. Being able to use
st_write()
would be much cleaner. Is it possible to implement this functionality?