r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
328 stars 78 forks source link

Polygon/multipolygon columns: incorrect write, correct read #439

Closed pachadotdev closed 3 months ago

pachadotdev commented 11 months ago

Incorrect write

If we open a regular DBI connection, as in the next example, we won't correctly write to Postgres with Postgis correctly configured.

library(RPostgres)
library(sf)
library(archive)

con <- dbConnect(
  drv = Postgres(),
  dbname = "shiny"
)

download.file("https://www2.census.gov/geo/tiger/TIGER2014/TRACT/tl_2014_01_tract.zip", "tl_2014_01_tract.zip")

archive_extract("tl_2014_01_tract.zip")

d <- read_sf("tl_2014_01_tract.shp")

# this saves geometry as text
# sf::dbWriteTable(con, "carto", d, append = TRUE)
# RPostgres::dbWriteTable(con, "carto", d, append = TRUE)

# this saves geometry as geometry (not working)
# st_write(d, con, "carto", append = TRUE, layer = "carto", layer_options = c("GEOMETRY_NAME=geometry"))

# this saves geometry as geometry
st_write(
  d2,
  "PG:host=localhost port=5432 dbname=shiny user=xxx password=yyy",
  layer = "carto",
  layer_options = c("GEOMETRY_NAME=geometry"),
  append = TRUE
)

Correct read

library(ggplot2)
library(dplyr)

# Ggplot2 won't like the <pq_gmtry> but it's correct
d2 <- tbl(con, "carto") %>%
    filter(countyfp == "001") %>%
    collect()

glimpse(d2)

... TRUNCATED OUTPUT ...
$ countyfp <chr> "001", "001", "001", "001", "001", "001", "001", "001", "001"…
$ tractce  <chr> "021100", "021000", "020900", "020100", "020300", "020400", "…
$ aland    <dbl> 478168607, 386854992, 292754413, 9810181, 5349273, 6382705, 1…
$ geometry <pq_gmtry> 0106000020AD100000010000000103000000010000003B0A0000A437…
...

# convert pq_gmtry to multipolygon
d2 <- d2 %>%
  mutate(geometry = st_as_sfc(geometry))

# plot ok
ggplot() +
    geom_sf(data = d2, aes(fill = aland, geometry = geometry))

dbDisconnect(con)

Here's my session info:

─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Linux Mint 21.2
 system   x86_64, linux-gnu
 ui       X11
 language en_CA:en
 collate  en_CA.UTF-8
 ctype    en_CA.UTF-8
 tz       America/Toronto
 date     2023-07-31
 pandoc   2.9.2.1 @ /usr/bin/pandoc

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 bit           4.0.5      2022-11-15 [1] CRAN (R 4.3.1)
 bit64         4.0.5      2020-08-30 [1] CRAN (R 4.3.1)
 blob          1.2.4      2023-03-17 [1] CRAN (R 4.3.1)
 cachem        1.0.8      2023-05-01 [1] CRAN (R 4.3.1)
 callr         3.7.3      2022-11-02 [1] CRAN (R 4.3.1)
 class         7.3-22     2023-05-03 [4] CRAN (R 4.3.1)
 classInt      0.4-9      2023-02-28 [1] CRAN (R 4.3.1)
 cli           3.6.1      2023-03-23 [1] CRAN (R 4.3.1)
 colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 crayon        1.5.2      2022-09-29 [1] CRAN (R 4.3.1)
 DBI           1.1.3      2022-06-18 [1] CRAN (R 4.3.1)
 dbplyr        2.3.3      2023-07-07 [1] CRAN (R 4.3.1)
 devtools    * 2.4.5      2022-10-11 [1] CRAN (R 4.3.1)
 digest        0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
 dplyr       * 1.1.2      2023-04-20 [1] CRAN (R 4.3.1)
 e1071         1.7-13     2023-02-01 [1] CRAN (R 4.3.1)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
 fansi         1.0.4      2023-01-22 [1] CRAN (R 4.3.1)
 farver        2.1.1      2022-07-06 [1] CRAN (R 4.3.1)
 fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
 fs            1.6.2      2023-04-25 [1] CRAN (R 4.3.1)
 generics      0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
 ggplot2     * 3.4.2      2023-04-03 [1] CRAN (R 4.3.1)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.3.1)
 gtable        0.3.3      2023-03-21 [1] CRAN (R 4.3.1)
 hms           1.1.3      2023-03-21 [1] CRAN (R 4.3.1)
 htmltools     0.5.5      2023-03-23 [1] CRAN (R 4.3.1)
 htmlwidgets   1.6.2      2023-03-17 [1] CRAN (R 4.3.1)
 httpuv        1.6.11     2023-05-11 [1] CRAN (R 4.3.1)
 jsonlite      1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 KernSmooth    2.23-22    2023-07-10 [4] CRAN (R 4.3.1)
 labeling      0.4.2      2020-10-20 [1] CRAN (R 4.3.1)
 later         1.3.1      2023-05-02 [1] CRAN (R 4.3.1)
 lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.3.1)
 lubridate     1.9.2      2023-02-10 [1] CRAN (R 4.3.1)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.3.1)
 memoise       2.0.1      2021-11-26 [1] CRAN (R 4.3.1)
 mime          0.12       2021-09-28 [1] CRAN (R 4.3.1)
 miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.3.1)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.3.1)
 pillar        1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
 pkgbuild      1.4.2      2023-06-26 [1] CRAN (R 4.3.1)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
 pkgload       1.3.2.1    2023-07-08 [1] CRAN (R 4.3.1)
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.3.1)
 processx      3.8.2      2023-06-30 [1] CRAN (R 4.3.1)
 profvis       0.3.8      2023-05-02 [1] CRAN (R 4.3.1)
 promises      1.2.0.1    2021-02-11 [1] CRAN (R 4.3.1)
 proxy         0.4-27     2022-06-09 [1] CRAN (R 4.3.1)
 ps            1.7.5      2023-04-18 [1] CRAN (R 4.3.1)
 purrr         1.0.1      2023-01-10 [1] CRAN (R 4.3.1)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
 Rcpp          1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
 remotes       2.4.2.1    2023-07-18 [1] CRAN (R 4.3.1)
 rlang         1.1.1      2023-04-28 [1] CRAN (R 4.3.1)
 RPostgres   * 1.4.5.9012 2023-07-22 [1] Github (r-dbi/RPostgres@58a052b)
 scales        1.2.1      2022-08-20 [1] CRAN (R 4.3.1)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
 sf          * 1.0-14     2023-07-11 [1] CRAN (R 4.3.1)
 shiny         1.7.4.1    2023-07-06 [1] CRAN (R 4.3.1)
 stringi       1.7.12     2023-01-11 [1] CRAN (R 4.3.1)
 stringr       1.5.0      2022-12-02 [1] CRAN (R 4.3.1)
 tibble        3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.1)
 timechange    0.2.0      2023-01-11 [1] CRAN (R 4.3.1)
 units         0.8-2      2023-04-27 [1] CRAN (R 4.3.1)
 urlchecker    1.0.1      2021-11-30 [1] CRAN (R 4.3.1)
 usethis     * 2.2.2      2023-07-06 [1] CRAN (R 4.3.1)
 utf8          1.2.3      2023-01-31 [1] CRAN (R 4.3.1)
 vctrs         0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.3.1)
 xtable        1.8-4      2019-04-21 [1] CRAN (R 4.3.1)

 [1] /home/pacha/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library
dpprdan commented 11 months ago

This only concerns the sf::dbWriteTable() method, so I think this is better handled there than here.

However, this is partly caused by RPostgres::dbWriteTable() stop()ing if append && !is.null(field.types)

https://github.com/r-dbi/RPostgres/blob/58a052b20f046c95723c332a0bb06fdb9ed362c4/R/dbWriteTable_PqConnection_character_data.frame.R#L24-L26

This also happens when the table does not (yet) exist, i.e. when there is nothing to append to.

Do you think it would be possible to move this check a few lines down and only stop if the table is found, @krlmlr?

https://github.com/r-dbi/RPostgres/blob/58a052b20f046c95723c332a0bb06fdb9ed362c4/R/dbWriteTable_PqConnection_character_data.frame.R#L47

Right now, dbWriteTable will create the table anyway if no field.types are provided.

https://github.com/r-dbi/RPostgres/blob/58a052b20f046c95723c332a0bb06fdb9ed362c4/R/dbWriteTable_PqConnection_character_data.frame.R#L60-L63

The downside is that dbWriteTable(append = TRUE, field.types = "something") will work when the table does not exist but stop when it does. This might be confusing. However, we could ameliorate this by issuing a message "You wanted to append, but the table does not exist yet so we created it".

Of course, third-party methods have to provide custom field.types for their datatypes, like sfc, so field.types cannot be NULL for them. So they cannot create a table currently if append = TRUE but the table does not exist.

krlmlr commented 11 months ago

Thanks, missed that. IIUC, we can fix this by fixing sf::dbWriteTable()?

dpprdan commented 11 months ago

Thanks, missed that. IIUC, we can fix this by fixing sf::dbWriteTable()?

Yes. But the fix is a bit ugly (IMO, i.e. overriding function arguments).

IMHO it would be best, i.e. cleanest, if append were only possible if the table already exists, but that's probably too much of a breaking change.

krlmlr commented 11 months ago

There are new generics dbCreateTable() and dbAppendTable(), does sf support these?

dpprdan commented 11 months ago

Ha, I just had a similar thought, i.e. refactor sf::dbWriteTable() using dbCreateTable() and dbAppendTable().

So no, sf does not support/use them yet.

Anyway, I am not convinced by my "create table if append = TRUE" proposal above myself anymore, so this can be closed IMO.

dpprdan commented 7 months ago

Fixed in {sf}: https://github.com/r-spatial/sf/pull/2223

krlmlr commented 3 months ago

Closing, fixed downstream. Thanks @dpprdan!