r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

About the relevance and future of `rpostgis` #58

Open basille opened 1 year ago

basille commented 1 year ago

The main purpose of rpostgis is to provide an interface between R and PostGIS to transparently transfer spatial data (both vectors and rasters) — secondarily, rpostgis also provides convenience functions to execute common procedures in PostGIS.

rpostgis was however developed (by @dnbucklin and myself) at a time when both sp and raster were the de facto reference packages for spatial data (first stable release of rpostgis in August 2016). Since then, R as seen an incredibly active development of the spatial ecosystem, most notably the packages sf, terra and stars. To stay relevant, rpostgis would need to switch to these modern classes of spatial objects, and thus support sf, terra and stars. In addition, packages rgdal, rgeos and maptools will retire by the end of 2023, which also means that rpostgis not only need to support the modern packages mentioned above, but also remove dependencies to rgeos (see this issue on rpostgis repository).

Altogether, this would require a major overhaul of rpostgis. Unfortunately, as our positions have evolved, neither @dnbucklin or myself have the time and resources to take care of this. If nothing happens, rpostgis will thus simply retire by the end of 2023 as well.

Relevance

But before this happens, we would like to first evaluate the relevance of the purpose of rpostgis, i.e. providing a connection to a PostGIS DB, and most importantly allowing bi-directional transfer between PostGIS to R of both vector and raster data. The first question is thus: Is this purpose still relevant?

Need

If the answers are negative above, then there are still gaps in the bidirectional transfer of spatial data between R and PostGIS. This leads me to the second question: Is there actually a need for such a tool?

One way to answer this is to look at the dependency network or rpostgis: There is currently a single package that relies on rpostgis: lucas (on CRAN) (package to download and create the DB of LUCAS Data Harmonized), in the form of an import.

Second, looking a little bit at available download statistics (e.g. at https://cranlogs.r-pkg.org/badges/rpostgis or through Hadley's CRAN download app) shows a relatively stable number of downloads over the years (around 500–1000/month).

Development and maintenance

If both the relevance and need for a tool like rpostgis are high, then comes the human problem. As said above, neither @dnbucklin or myself can commit to rpostgis anymore. We remain available (as much as possible) to support a smooth transition though. The call is thus open as to whether anybody is interested in taking over rpostgis and commit to a sustained future of the package.

Cidree commented 1 year ago

Dear Mathieu,

I am writing to express my keen interest in the purpose of the rpostgis package. In today's spatial environment packages of R, I believe it still holds its significance as the only one offering a clear bidirectional transfer of shapefiles and rasters.

Furthermore, I am impressed by the thorough documentation and the tremendous effort put into developing the rpostgis package. As a scientist who works extensively with spatial data and spatial databases, I am eager to support and contribute to the transition of the package to the new ecosystem of spatial packages.

Please let me know if you are still interested in maintaining the package, as I am more than willing to offer my assistance.

Sincerely, Adrián

basille commented 1 year ago

Thanks @Cidree for expressing your interest in taking over the package, this is great news!

Before you start any work on the package code itself, I think it is important to address the questions above about relevance. Let's first see where it leads us.

edzer commented 1 year ago

sf can read and write vector data through

basille commented 1 year ago

Thanks @edzer, that's super relevant! Indeed, looking at the GDAL page for the PostGIS Raster:

Currently, the driver provides read-only support to PostGIS Raster data sources.

So that means, in short, that no spatial package in R that relies on GDAL can actually write PostGIS rasters.

Could you please point us in the right direction about the PostGIS driver in DBI? I couldn't find a good place to start (this issue for instance goes nowhere). That might indeed be the place to start for raster exchange between R and PostGIS…

edzer commented 1 year ago

So that means, in short, that no spatial package in R that relies on GDAL can actually write PostGIS rasters.

No, I said that the driver supports CreateCopy, and see e.g. here: https://github.com/rspatial/terra/blob/master/src/write_gdal.cpp#L1009 for a case where terra uses that.

rhijmans commented 1 year ago

terra uses CreateCopy for formats that cannot be created directly; but that path does not seem to work for the PostGISRaster driver.

library(terra)
r <- rast(nrows=5, ncols=5, vals=1:25)
f <- file.path(tempdir(), "test.pg")
writeRaster(r, f, filetype="PostGISRaster")
#Error: [writeRaster] mem copy create failed for PostGISRaster
#In addition: Warning message:
#PostGISRasterDataset::CreateCopy() only works on source datasets that are PostGISRaster (GDAL error 6) 
basille commented 1 year ago

Sorry @edzer, I actually misread GDAL documentation and your answer! So CreateCopy would be the way to go. However, following @rhijmans' answer, it's a no go with terra.

My follow-up question here: Is this an issue linked to terra, or to GDAL directly? And more generally, in your opinion, where should such a function belong (i.e. a function that copies a raster from R to a PostGIS database): specific packages (terra/stars), DBI directly, or separately in rpostgis?

(my apologies if I seem hesitant on those considerations, I'm really not that familar with GDAL… @dnbucklin used to deal with the raster stuff)

edzer commented 1 year ago

However, following @rhijmans' answer, it's a no go with terra.

I don't think we can conclude that now. Robert's example tries to write to a file, but he should have tried to write to a string describing a database connection, and have a database running accepting the connection, in order to fully test.

If that doesn't work, it would be easy to make it work. Similarly for stars, one could implement the path that terra takes.

A more broader question is whether today anyone uses PostGISRaster for (serious) raster data storage and analysis work.

rhijmans commented 1 year ago

Edzer is right, of course, about needing a database connection rather than a filename. So please try that with terra. It may not work because terra tries to interpret the string as a filename. But that can be fixed. Feel free to raise an issue for this.

goergen95 commented 1 year ago

I was following this discussion and was intrigued to prepare a small reproducible example you can find here. The core script is this:

library(sf)

# specify connection inputs (see docker-compose.yaml)
host <- "postgres" 
port <- '5432' 
db <- 'postgis' 
db_user <- "postgis"  
db_pass <- "postgis" 

# read data
vector <- system.file("gpkg/nc.gpkg", package = "sf")
tif <- system.file("tif/L7_ETMs.tif", package = "stars")

vector_sf <- sf::read_sf(vector)
vector_terra <- terra::vect(vector)
raster_terra <- terra::rast(tif)[[1]]
raster_stars <- stars::read_stars(tif)[,,,1]

# write/read vector data with sf - works 
con_str <- glue::glue("postgresql://{db_user}:{db_pass}@{host}:{port}/{db}")
sf::st_write(obj = vector_sf[1:50,], dsn = con_str, layer = "vector_sf", driver = "PostgreSQL", delete_dsn = TRUE)
(sf::st_read(con_str, layer = "vector_sf"))
(terra::vect(con_str, layer = "vector_sf"))

# write/read vector data with terra - works
terra::writeVector(vector_terra[51:100,], filename = con_str, filetype = "PostgreSQL", layer = "vector_terra", options = "OVERWRITE=YES")
(terra::vect(con_str, layer = "vector_terra"))
(sf::st_read(con_str, layer = "vector_terra"))

# write/read raster data with stars - does not work
base_str <- glue::glue("PG:host={host} port={port} dbname='{db}' user='{db_user}' password='{db_pass}'")
try(stars::write_stars(raster_stars, dsn = glue::glue(paste0(base_str, " table='{table}'"), table = "raster_stars"), driver = "PostGISRaster"))

# write/read raster data with terra - does not work
try(terra::writeRaster(raster_terra, filename = glue::glue(paste0(base_str, " table='{table}'"), table = "raster_terra"), filetype = "PostGISRaster"))

# read existing raster from postgis - works
(org_stars <- stars::read_stars(glue::glue(paste0(base_str, " table='{table}'"), table = "postgis_raster")))
(org_terra <- terra::rast(glue::glue(paste0(base_str, " table='{table}'"), table = "postgis_raster")))

# copy a PostGISRaster with stars - does not work
try(stars::write_stars(org_stars, dsn = glue::glue(paste0(base_str, " table='{table}'"), table = "postgis_raster_stars"), driver = "PostGISRaster"))

# copy a PostGISRaster with terra - does not work
try(terra::writeRaster(org_terra, filename = glue::glue(paste0(base_str, " table='{table}'"), table = "postgis_raster_terra"), filetype = "PostGISRaster"))

Note, that I am using docker to set up a PostgreSQL server and a container with R and spatial packages installed.

Currently, it seems that the GDAL PostgreSQL vector driver allows both sf and terra to read/write vector data to/from a PostgreSQL server. Writing raster using GDAL's PostGISRaster driver fails, but it is indicated that it provides read-only support. Consequentially, reading from a PostgreSQL server works for both packages. A naive approach for copying a source PostGISRaster does not seem to work for both packages.

edzer commented 1 year ago

@goergen95 : fantastic reprex!

basille commented 1 year ago

Sorry I wasn't able to respond earlier. Thank you all for the feedback, and especially @goergen95 for this great work on bidirectional tests! This is an excellent piece of work to set up the issues at work. So far, we thus have:

First question is then: Would it be an option to consider proper implementation of writing rasters in a PostGIS DB (for both terra and stars)? Would it be a lot of work? The repository of @goergen95 provides pointers about what's going wrong for each package.

A more broader question is whether today anyone uses PostGISRaster for (serious) raster data storage and analysis work.

@edzer, you ask a very relevant question, but I don't know the answer to it. To start with, there does not seem to be very dynamic development of the PostGIS Raster source code, but that might simply reflect that it is a mature product. There was not excessive feedback on the retirement of rpostgis, but two contributors (@Cidree here and @Thorsten-Behrens on the rpostgis repository) expressed their interest and previous/ongoing work on the matter. The fact that @goergen95 also picked it up to implement his test repository is also indicative of some interest.

Generally speaking, I tend to think that even a minimal audience can justify the implementation of a feature, especially if it's not too much work, and if the responsibility is shared (in this case, users in need of such functionality could potentially jump in and help).

carathem commented 9 months ago

I appreciate all of your hard work and thoughts about this issue so far!

I'm a newcomer to spatial data processing in general and PostgreSQL in specific, but I want to add my voice to the I'd really appreciate it if someone who is much more able than I made this work in the sf/terra/stars environment crowd.

If people aren't using PostgreSQL for raster storage and querying, is there another system that people are using that works better? As I mentioned, I am very much a newcomer to this field, and my team in the midst of putting together a PostgreSQL database to use with international landcover, etc. raster data that I need to be able to access through R. Because these rasters are very large, the need to query specific portions of them is, obviously, important.

Thank you all for your hard work! You keep people like me going when we'd be completely floundering otherwise!

rhijmans commented 9 months ago

You can query parts of a raster with, for example, terra::crop and terra::extract. You can also use terra::window to set an area of interest. Similar approaches can also be used for vector data, by the way. Performance can vary a lot with file format and organization. And what is "best" may, of course, depend on the specific needs for your application.