r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.35k stars 300 forks source link

Using `rename_with` on an sf object breaks `relation_to_geometry` attributes #1472

Closed ajjitn closed 1 year ago

ajjitn commented 4 years ago

Example Data

library(sf)
library(tidyverse)

# geojson of polling places in philly, some columns have `_` in column title
df = st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+polling_places&filename=polling_places&format=geojson&skipfields=cartodb_id")

#  for example sake, convert all `_` to `-` in column names
df2 = df %>% rename_with(~str_replace_all(.x, "_", "-"))

Problem

Using rename on df works perfectly fine,

df %>% rename(id = objectid)

But using rename on df2 returns an Internal error: can't find 'agr' columns

df2 %>% rename(id = objectid)
#> Error in rename.sf(., id = objectid): internal error: can't find `agr` columns

Doing the same renaming operation with select works fine

df2  %>% select(id = objectid)`

Cause?

I think the rename_with function messes up the relation_to_geometry attributes of the sf object. Below is the output of st_agr(df) :

st_agr(df)
#>           objectid               ward           division           precinct 
#>               <NA>               <NA>               <NA>               <NA> 
#>          placename     street_address           zip_code accessibility_code 
#>               <NA>               <NA>               <NA>               <NA> 
#>       parking_code 
#>               <NA> 
#> Levels: constant aggregate identity

And below is the output of st_agr(df2):

st_agr(df2)
#>  objectid      ward  division  precinct placename      <NA>      <NA>      <NA> 
#>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA> 
#>      <NA> 
#>      <NA> 
#> Levels: constant aggregate identity

The names of the last 3 items of the factor (ie the 3 columns that were renamed in the rename_with call) are now all NA. This cause a check within rename.sf to fail and print out the error.

Session Info

``` R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reprex_0.3.0 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.1 purrr_0.3.4 readr_1.3.1 tidyr_1.1.1 tibble_3.0.3 [9] ggplot2_3.3.2 tidyverse_1.3.0 sf_0.9-5 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 lubridate_1.7.9 ps_1.3.4 class_7.3-17 rprojroot_1.3-2 assertthat_0.2.1 [7] digest_0.6.25 utf8_1.1.4 mime_0.9 R6_2.4.1 cellranger_1.1.0 backports_1.1.8 [13] evaluate_0.14 e1071_1.7-3 httr_1.4.2 pillar_1.4.6 rlang_0.4.7 readxl_1.3.1 [19] rstudioapi_0.11 callr_3.4.3 whisker_0.4 blob_1.2.1 rmarkdown_2.3 desc_1.2.0 [25] munsell_0.5.0 shiny_1.5.0 broom_0.7.0 compiler_4.0.2 httpuv_1.5.4 modelr_0.1.8 [31] xfun_0.16 pkgconfig_2.0.3 clipr_0.7.0 htmltools_0.5.0 tidyselect_1.1.0 fansi_0.4.1 [37] crayon_1.3.4 dbplyr_1.4.4 withr_2.2.0 later_1.1.0.1 grid_4.0.2 jsonlite_1.7.0 [43] xtable_1.8-4 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 magrittr_1.5 units_0.6-7 [49] scales_1.1.1 KernSmooth_2.23-17 cli_2.0.2 stringi_1.4.6 fs_1.5.0 promises_1.1.1 [55] testthat_2.3.2 xml2_1.3.2 ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.2 tools_4.0.2 [61] glue_1.4.1 hms_0.5.3 pkgload_1.1.0 processx_3.4.3 fastmap_1.0.1 colorspace_1.4-1 [67] classInt_0.4-3 rvest_0.3.6 knitr_1.29 haven_2.3.1 ```

Full Reprex

``` r library(tidyverse) library(sf) #> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1 # dataset of polling places in philly, some columns have `_` in column title df = st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+polling_places&filename=polling_places&format=geojson&skipfields=cartodb_id") #> Reading layer `OGRGeoJSON' from data source `https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+polling_places&filename=polling_places&format=geojson&skipfields=cartodb_id' using driver `GeoJSON' #> Simple feature collection with 1703 features and 9 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -75.2678 ymin: 39.90876 xmax: -74.96944 ymax: 40.12095 #> geographic CRS: WGS 84 df2 = df %>% rename_with(~str_replace_all(.x, "_", "-")) # Works as expected df %>% rename(id = objectid) #> Simple feature collection with 1703 features and 9 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -75.2678 ymin: 39.90876 xmax: -74.96944 ymax: 40.12095 #> geographic CRS: WGS 84 #> First 10 features: #> id ward division precinct placename #> 1 25072 5 30 0530 THE PAINTED BRIDE #> 2 25073 5 31 0531 CONVENTION CENTER #> 3 25074 5 32 0532 THE PAINTED BRIDE #> 4 25075 5 33 0533 THE PAINTED BRIDE #> 5 25076 5 34 0534 THE PAINTED BRIDE #> 6 25077 5 35 0535 CONVENTION CENTER #> 7 25078 5 36 0536 McCALL SCHOOL #> 8 25079 5 37 0537 CONVENTION CENTER #> 9 25080 6 1 0601 LUCIEN BLACKWELL COMMUNITY CENTER #> 10 25081 6 2 0602 LUCIEN BLACKWELL COMMUNITY CENTER #> street_address zip_code accessibility_code parking_code #> 1 230 VINE ST 19106 M N #> 2 N BROAD & RACE STS 19107 F N #> 3 230 VINE ST 19106 M N #> 4 230 VINE ST 19106 M N #> 5 230 VINE ST 19106 M N #> 6 N BROAD & RACE STS 19107 F N #> 7 325 S 7TH ST 19106 M N #> 8 N BROAD & RACE STS 19107 F N #> 9 761 N 47TH ST 19139 F H #> 10 761 N 47TH ST 19139 F H #> geometry #> 1 POINT (-75.14387 39.9555) #> 2 POINT (-75.16269 39.95622) #> 3 POINT (-75.14387 39.9555) #> 4 POINT (-75.14387 39.9555) #> 5 POINT (-75.14387 39.9555) #> 6 POINT (-75.16269 39.95622) #> 7 POINT (-75.15318 39.94464) #> 8 POINT (-75.16269 39.95622) #> 9 POINT (-75.21484 39.96586) #> 10 POINT (-75.21484 39.96586) # Works as expected df2 %>% select(id = objectid) #> Simple feature collection with 1703 features and 1 field #> geometry type: POINT #> dimension: XY #> bbox: xmin: -75.2678 ymin: 39.90876 xmax: -74.96944 ymax: 40.12095 #> geographic CRS: WGS 84 #> First 10 features: #> id geometry #> 1 25072 POINT (-75.14387 39.9555) #> 2 25073 POINT (-75.16269 39.95622) #> 3 25074 POINT (-75.14387 39.9555) #> 4 25075 POINT (-75.14387 39.9555) #> 5 25076 POINT (-75.14387 39.9555) #> 6 25077 POINT (-75.16269 39.95622) #> 7 25078 POINT (-75.15318 39.94464) #> 8 25079 POINT (-75.16269 39.95622) #> 9 25080 POINT (-75.21484 39.96586) #> 10 25081 POINT (-75.21484 39.96586) df2 %>% rename(id = objectid) #> Error in rename.sf(., id = objectid): internal error: can't find `agr` columns st_agr(df) #> objectid ward division precinct #> #> placename street_address zip_code accessibility_code #> #> parking_code #> #> Levels: constant aggregate identity st_agr(df2) #> objectid ward division precinct placename #> #> #> #> Levels: constant aggregate identity ``` Created on 2020-08-21 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)

mtennekes commented 4 years ago

Probably related: rename-ing the last column doesn't work when this column is not the sf_column:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

# create sf object with sf_column not being the last
data(World, package = "tmap")
W2 = World %>% 
  as.data.frame() %>% 
  select(iso_a3, geometry, HPI)
class(W2) = c("sf", "data.frame")
attr(W2, "sf_column") = "geometry"

W2
#> Simple feature collection with 177 features and 2 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -16656120 ymin: -8460601 xmax: 16656120 ymax: 8375779
#> CRS:            +proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
#> First 10 features:
#>    iso_a3                       geometry      HPI
#> 1     AFG MULTIPOLYGON (((5310471 451... 20.22535
#> 2     AGO MULTIPOLYGON (((1531585 -77...       NA
#> 3     ALB MULTIPOLYGON (((1729835 521... 36.76687
#> 4     ARE MULTIPOLYGON (((4675864 313...       NA
#> 5     ARG MULTIPOLYGON (((-5017766 -6... 35.19024
#> 6     ARM MULTIPOLYGON (((3677241 513... 25.66642
#> 7     ATA MULTIPOLYGON (((-3403793 -8...       NA
#> 8     ATF MULTIPOLYGON (((5550200 -59...       NA
#> 9     AUS MULTIPOLYGON (((12288137 -5... 21.22897
#> 10    AUT MULTIPOLYGON (((1371802 588... 30.47822

# renaming the last column fails
W2 %>% rename(x = HPI)
#> Error in names(agr) <- `*vtmp*`: 'names' attribute [3] must be the same length as the vector [2]

# renaming another column works, but the sf_column is placed last
W2 %>% rename(x = iso_a3)
#> Simple feature collection with 177 features and 2 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -16656120 ymin: -8460601 xmax: 16656120 ymax: 8375779
#> CRS:            +proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
#> First 10 features:
#>      x      HPI                       geometry
#> 1  AFG 20.22535 MULTIPOLYGON (((5310471 451...
#> 2  AGO       NA MULTIPOLYGON (((1531585 -77...
#> 3  ALB 36.76687 MULTIPOLYGON (((1729835 521...
#> 4  ARE       NA MULTIPOLYGON (((4675864 313...
#> 5  ARG 35.19024 MULTIPOLYGON (((-5017766 -6...
#> 6  ARM 25.66642 MULTIPOLYGON (((3677241 513...
#> 7  ATA       NA MULTIPOLYGON (((-3403793 -8...
#> 8  ATF       NA MULTIPOLYGON (((5550200 -59...
#> 9  AUS 21.22897 MULTIPOLYGON (((12288137 -5...
#> 10 AUT 30.47822 MULTIPOLYGON (((1371802 588...

Created on 2020-08-23 by the reprex package (v0.3.0)

edzer commented 4 years ago

@mtennekes I think this has been fixed in the GH vesion.

edzer commented 4 years ago

@ajjitn yes, rename_with needs a method for sf objects that takes care of the agr attribute, in order to create correct sf objects.

mtennekes commented 4 years ago

@mtennekes I think this has been fixed in the GH vesion.

Thanks, this works now. However, I also got the internal error: can't findagrcolumns error message when applying rename. I didn't use rename_with.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
#> WARNING: different compile-time and runtime versions for GEOS found:
#> Linked against: 3.8.0-CAPI-1.13.1  compiled against: 3.7.3-CAPI-1.11.3
#> It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
a = structure(list(name = c("Appingedam", "Bedum"), id = c("GM0003", 
                                                           "GM0005"), geometry = structure(list(structure(c(762481.389554466, 
                                                                                                            7041870.90443253), class = c("XY", "POINT", "sfg")), structure(c(734389.40998089, 
                                                                                                                                                                             7037725.85472037), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
                                                                                                                                                                                                                                             "sfc"), precision = 0, bbox = structure(c(xmin = 734389.40998089, 
                                                                                                                                                                                                                                                                                       ymin = 7037725.85472037, xmax = 762481.389554466, ymax = 7041870.90443253
                                                                                                                                                                                                                                             ), class = "bbox"), crs = structure(list(input = "EPSG:3857", 
                                                                                                                                                                                                                                                                                      wkt = "PROJCRS[\"WGS 84 / Pseudo-Mercator\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"Popular Visualisation Pseudo-Mercator\",\n        METHOD[\"Popular Visualisation Pseudo Mercator\",\n            ID[\"EPSG\",1024]],\n        PARAMETER[\"Latitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"False easting\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"easting (X)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"northing (Y)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"unknown\"],\n        AREA[\"World - 85°S to 85°N\"],\n        BBOX[-85.06,-180,85.06,180]],\n    ID[\"EPSG\",3857]]"), class = "crs"), n_empty = 0L), 
                   show = c(TRUE, TRUE), value_out = c(2321, 2590), value_in = c(1429, 
                                                                                 1353), size = c(5000, 5000), Amsterdam = c(0, 0), Rotterdam = c(0, 
                                                                                                                                                 0), `Den Haag` = c(0, 0), Utrecht = c(0, 0), other = c(100, 
                                                                                                                                                                                                        100)), row.names = c(NA, -2L), sf_column = "geometry", agr = structure(c(name = NA_integer_, 
                                                                                                                                                                                                                                                                                 id = NA_integer_, show = NA_integer_, value_out = NA_integer_, 
                                                                                                                                                                                                                                                                                 value_in = NA_integer_, size = NA_integer_, Amsterdam = NA_integer_, 
                                                                                                                                                                                                                                                                                 Rotterdam = NA_integer_, `Den Haag` = NA_integer_, Utrecht = NA_integer_, 
                                                                                                                                                                                                                                                                                 other = NA_integer_), .Label = c("constant", "aggregate", "identity"
                                                                                                                                                                                                                                                                                 ), class = "factor"), class = c("sf", "tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                                                                                                 ))

# works:
a %>%
  rename(Utrecht2 = Utrecht,
         outflow = value_out)
#> Simple feature collection with 2 features and 11 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 734389.4 ymin: 7037726 xmax: 762481.4 ymax: 7041871
#> projected CRS:  WGS 84 / Pseudo-Mercator
#> # A tibble: 2 x 12
#>   name  id                     geometry show  outflow value_in  size Amsterdam
#>   <chr> <chr>               <POINT [m]> <lgl>   <dbl>    <dbl> <dbl>     <dbl>
#> 1 Appi… GM00…        (762481.4 7041871) TRUE     2321     1429  5000         0
#> 2 Bedum GM00…        (734389.4 7037726) TRUE     2590     1353  5000         0
#> # … with 4 more variables: Rotterdam <dbl>, `Den Haag` <dbl>, Utrecht2 <dbl>,
#> #   other <dbl>

# fails:
a %>%
  rename(Utrecht2 = Utrecht) %>%
  rename(outflow = value_out)
#> Error in rename.sf(., outflow = value_out): internal error: can't find `agr` columns

# something strange with the last column after renaming:
a %>%
  rename(Utrecht2 = Utrecht) %>% 
  st_agr()
#>      name        id      show value_out  value_in      size Amsterdam Rotterdam 
#>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA> 
#>  Den Haag  Utrecht2      <NA> 
#>      <NA>      <NA>      <NA> 
#> Levels: constant aggregate identity

st_agr(a)
#>      name        id      show value_out  value_in      size Amsterdam Rotterdam 
#>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA> 
#>  Den Haag   Utrecht     other 
#>      <NA>      <NA>      <NA> 
#> Levels: constant aggregate identity

Created on 2020-08-24 by the reprex package (v0.3.0)

Using the GH version of sf.

allenbaron commented 3 years ago

@mtennekes Pass the sf object through an empty mutate() call any time you see that error, e.g.

a %>%
    rename(Utrecht2 = Utrecht) %>%
    mutate() %>%
    rename(outflow = value_out)

mutate() somehow corrects "agr" attribute names.

ANOTHER EXAMPLE: Based on `us_counties()` data mentioned in issue #1509 ``` r library(tidyverse) library(sf) #> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1 library(USAboundaries) # NOTE: us_counties() requires installation of USAboundariesData, if it doesn't # install automatically, use devtools. # devtools::install_github("ropensci/USAboundariesData") counties <- us_counties() ##### ERROR ###### counties %>% rename(State = state_name) #> Error in rename.sf(., State = state_name): internal error: can't find `agr` columns # The problem - $agr names don't match column names colnames(counties) #> [1] "statefp" "countyfp" "countyns" #> [4] "affgeoid" "geoid" "name" #> [7] "lsad" "aland" "awater" #> [10] "state_name" "state_abbr" "jurisdiction_type" #> [13] "geometry" attr(counties, "agr") %>% names() #> [1] "STATEFP" "COUNTYFP" "COUNTYNS" #> [4] "AFFGEOID" "GEOID" "NAME" #> [7] "LSAD" "ALAND" "AWATER" #> [10] "state_name" "state_abbr" "jurisdiction_type" ##### Fix $agr attribute names DIRECTLY ##### agr_names_fixed <- attr(counties, "agr") %>% purrr::set_names(., tolower(names(.))) counties_agr_names_fixed <- counties %>% magrittr::set_attr("agr", agr_names_fixed) # It works counties_agr_names_fixed %>% rename(State = state_name) #> Simple feature collection with 3220 features and 12 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256 #> Geodetic CRS: WGS 84 #> First 10 features: #> statefp countyfp countyns affgeoid geoid name lsad aland #> 1 39 131 01074078 0500000US39131 39131 Pike 06 1140324458 #> 2 46 003 01266983 0500000US46003 46003 Aurora 06 1834813753 #> 3 55 035 01581077 0500000US55035 55035 Eau Claire 06 1652211310 #> 4 72 145 01804553 0500000US72145 72145 Vega Baja 13 118766803 #> 5 48 259 01383915 0500000US48259 48259 Kendall 06 1715747531 #> 6 40 015 01101795 0500000US40015 40015 Caddo 06 3310745124 #> 7 19 093 00465235 0500000US19093 19093 Ida 06 1117599859 #> 8 28 071 00695759 0500000US28071 28071 Lafayette 06 1636141755 #> 9 12 027 00294452 0500000US12027 12027 DeSoto 06 1649978040 #> 10 31 137 00835890 0500000US31137 31137 Phelps 06 1398048574 #> awater State state_abbr jurisdiction_type #> 1 9567612 Ohio OH state #> 2 11201379 South Dakota SD state #> 3 18848512 Wisconsin WI state #> 4 57805868 Puerto Rico PR territory #> 5 1496797 Texas TX state #> 6 30820525 Oklahoma OK state #> 7 1406461 Iowa IA state #> 8 123052156 Mississippi MS state #> 9 6247257 Florida FL state #> 10 1646534 Nebraska NE state #> geometry #> 1 MULTIPOLYGON (((-83.35353 3... #> 2 MULTIPOLYGON (((-98.80777 4... #> 3 MULTIPOLYGON (((-91.65045 4... #> 4 MULTIPOLYGON (((-66.44899 1... #> 5 MULTIPOLYGON (((-98.92015 3... #> 6 MULTIPOLYGON (((-98.62315 3... #> 7 MULTIPOLYGON (((-95.74161 4... #> 8 MULTIPOLYGON (((-89.72134 3... #> 9 MULTIPOLYGON (((-82.0565 27... #> 10 MULTIPOLYGON (((-99.64346 4... ##### Fix $agr attribute names using mutate() ##### counties %>% mutate() %>% rename(State = state_name) #> Simple feature collection with 3220 features and 12 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256 #> Geodetic CRS: WGS 84 #> First 10 features: #> statefp countyfp countyns affgeoid geoid name lsad aland #> 1 39 131 01074078 0500000US39131 39131 Pike 06 1140324458 #> 2 46 003 01266983 0500000US46003 46003 Aurora 06 1834813753 #> 3 55 035 01581077 0500000US55035 55035 Eau Claire 06 1652211310 #> 4 72 145 01804553 0500000US72145 72145 Vega Baja 13 118766803 #> 5 48 259 01383915 0500000US48259 48259 Kendall 06 1715747531 #> 6 40 015 01101795 0500000US40015 40015 Caddo 06 3310745124 #> 7 19 093 00465235 0500000US19093 19093 Ida 06 1117599859 #> 8 28 071 00695759 0500000US28071 28071 Lafayette 06 1636141755 #> 9 12 027 00294452 0500000US12027 12027 DeSoto 06 1649978040 #> 10 31 137 00835890 0500000US31137 31137 Phelps 06 1398048574 #> awater State state_abbr jurisdiction_type #> 1 9567612 Ohio OH state #> 2 11201379 South Dakota SD state #> 3 18848512 Wisconsin WI state #> 4 57805868 Puerto Rico PR territory #> 5 1496797 Texas TX state #> 6 30820525 Oklahoma OK state #> 7 1406461 Iowa IA state #> 8 123052156 Mississippi MS state #> 9 6247257 Florida FL state #> 10 1646534 Nebraska NE state #> geometry #> 1 MULTIPOLYGON (((-83.35353 3... #> 2 MULTIPOLYGON (((-98.80777 4... #> 3 MULTIPOLYGON (((-91.65045 4... #> 4 MULTIPOLYGON (((-66.44899 1... #> 5 MULTIPOLYGON (((-98.92015 3... #> 6 MULTIPOLYGON (((-98.62315 3... #> 7 MULTIPOLYGON (((-95.74161 4... #> 8 MULTIPOLYGON (((-89.72134 3... #> 9 MULTIPOLYGON (((-82.0565 27... #> 10 MULTIPOLYGON (((-99.64346 4... ``` Created on 2021-05-27 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)
Session Info ``` r R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] USAboundaries_0.3.1 sf_0.9-8 forcats_0.5.1 [4] stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4 [7] readr_1.4.0 tidyr_1.1.3 tibble_3.1.1 [10] ggplot2_3.3.3 tidyverse_1.3.1 devtools_2.4.0 [13] usethis_2.0.1 loaded via a namespace (and not attached): [1] httr_1.4.2 pkgload_1.2.1 jsonlite_1.7.2 [4] modelr_0.1.8 assertthat_0.2.1 highr_0.9 [7] cellranger_1.1.0 yaml_2.2.1 remotes_2.3.0 [10] sessioninfo_1.1.1 pillar_1.6.0 backports_1.2.1 [13] glue_1.4.2 digest_0.6.27 rvest_1.0.0 [16] colorspace_2.0-0 htmltools_0.5.1.1 clipr_0.7.1 [19] pkgconfig_2.0.3 broom_0.7.6 haven_2.4.1 [22] USAboundariesData_0.3.1 scales_1.1.1 processx_3.5.2 [25] proxy_0.4-25 styler_1.4.1 generics_0.1.0 [28] ellipsis_0.3.2 cachem_1.0.4 withr_2.4.2 [31] cli_2.5.0 magrittr_2.0.1 crayon_1.4.1 [34] readxl_1.3.1 evaluate_0.14 memoise_2.0.0 [37] ps_1.6.0 fs_1.5.0 fansi_0.4.2 [40] xml2_1.3.2 class_7.3-18 pkgbuild_1.2.0 [43] tools_4.0.5 prettyunits_1.1.1 hms_1.0.0 [46] lifecycle_1.0.0 munsell_0.5.0 reprex_2.0.0 [49] callr_3.7.0 compiler_4.0.5 e1071_1.7-6 [52] rlang_0.4.11 classInt_0.4-3 units_0.7-1 [55] grid_4.0.5 rstudioapi_0.13 rmarkdown_2.7 [58] testthat_3.0.2 gtable_0.3.0 DBI_1.1.1 [61] R6_2.5.0 lubridate_1.7.10 knitr_1.33 [64] fastmap_1.1.0 utf8_1.2.1 rprojroot_2.0.2 [67] KernSmooth_2.23-18 desc_1.3.0 stringi_1.5.3 [70] Rcpp_1.0.6 vctrs_0.3.8 dbplyr_2.1.1 [73] tidyselect_1.1.1 xfun_0.22 ```
edzer commented 3 years ago

Thanks for the reminder. Should now work without the mutate() hack.

lucyrandrews commented 2 years ago

Hi folks. I'm running into an error that seems to fall under this issue, and the mutate() hack isn't working for me. I've read through other issues related to renaming columns in sf objects (e.g. #1509, #1483) but I'm not finding a solution that works for me. Apologies if this is user error!

Reprex:

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.1
#> Warning: package 'tidyr' was built under R version 4.1.1
#> Warning: package 'readr' was built under R version 4.1.1
#> Warning: package 'stringr' was built under R version 4.1.1
library(sf)
#> Warning: package 'sf' was built under R version 4.1.1
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1

wd <- getwd()

# specify file download url
file_zip_url <- "https://gispublic.waterboards.ca.gov/portal/sharing/rest/content/items/746870b4aa1f47579ca1bf5d6c9924a8/data"

# download file as a zipped file
download.file(url = file_zip_url,
              destfile = paste0(wd, "/eWRIMS_Data.gdb.zip"))

# unzip the zipped file to geodatabase
unzip(zipfile = paste0(wd, "/eWRIMS_Data.gdb.zip"),
      exdir = paste0(wd, "/eWRIMS_Data.gdb"))

# read in geodatabase
ewrims_orig <- st_read(dsn = paste0(wd, "/eWRIMS_Data.gdb"),
                       layer = "Points_of_Diversion_20220201") %>%
  st_as_sf()
#> Reading layer `Points_of_Diversion_20220201' from data source 
#>   `/private/var/folders/k0/b257gr8s08b6mttj1kh5md8c0000gn/T/Rtmp71xgzb/reprex-94341173ddf-vivid-eider/eWRIMS_Data.gdb' 
#>   using driver `OpenFileGDB'
#> Simple feature collection with 61084 features and 38 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -373082.2 ymin: -595544.9 xmax: 538748.3 ymax: 449192.9
#> Projected CRS: NAD83 / California Albers

# rename columns
ewrims_rename <- ewrims_orig %>%
  rename_with(tolower)

# try a select call
ewrims_rename %>%
  mutate() %>%
  select(pod_id, appl_id) # ERROR
#> Error in select.sf(., pod_id, appl_id): internal error: can't find sf column

# look at sf objects class, geometry, and agr
class(ewrims_orig)
#> [1] "sf"         "data.frame"

class(ewrims_rename)
#> [1] "sf"         "data.frame"

st_geometry(ewrims_orig)
#> Geometry set for 61084 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -373082.2 ymin: -595544.9 xmax: 538748.3 ymax: 449192.9
#> Projected CRS: NAD83 / California Albers
#> First 5 geometries:
#> POINT (331471.2 -564613.1)
#> POINT (335041.1 -564137.4)
#> POINT (289678.3 -563507)
#> POINT (289761.5 -563290.5)
#> POINT (295868.7 -563265.8)

st_geometry(ewrims_rename) # ERROR
#> Error in st_geometry.sf(ewrims_rename): attr(obj, "sf_column") does not point to a geometry column.
#> Did you rename it, without setting st_geometry(obj) <- "newname"?

st_agr(ewrims_orig)
#>              POD_ID             APPL_ID             POD_NUM            APPL_POD 
#>                <NA>                <NA>                <NA>                <NA> 
#>     TOWNSHIP_NUMBER  TOWNSHIP_DIRECTION        RANGE_NUMBER     RANGE_DIRECTION 
#>                <NA>                <NA>                <NA>                <NA> 
#>      SECTION_NUMBER  SECTION_CLASSIFIER             QUARTER     QUARTER_QUARTER 
#>                <NA>                <NA>                <NA>                <NA> 
#>            MERIDIAN         NORTH_COORD          EAST_COORD             SP_ZONE 
#>                <NA>                <NA>                <NA>                <NA> 
#>            LATITUDE           LONGITUDE           TRIB_DESC     LOCATION_METHOD 
#>                <NA>                <NA>                <NA>                <NA> 
#>         SOURCE_NAME            MOVEABLE            HAS_OPOD           WATERSHED 
#>                <NA>                <NA>                <NA>                <NA> 
#>              COUNTY         WELL_NUMBER       QUAD_MAP_NAME        QUAD_MAP_NUM 
#>                <NA>                <NA>                <NA>                <NA> 
#>    QUAD_MAP_MIN_SER       PARCEL_NUMBER DIVERSION_SITE_NAME    LAST_UPDATE_DATE 
#>                <NA>                <NA>                <NA>                <NA> 
#> LAST_UPDATE_USER_ID        SPECIAL_AREA              HUC_12               HUC_8 
#>                <NA>                <NA>                <NA>                <NA> 
#>           HU_8_NAME          HU_12_NAME 
#>                <NA>                <NA> 
#> Levels: constant aggregate identity

st_agr(ewrims_rename)
#>              pod_id             appl_id             pod_num            appl_pod 
#>                <NA>                <NA>                <NA>                <NA> 
#>     township_number  township_direction        range_number     range_direction 
#>                <NA>                <NA>                <NA>                <NA> 
#>      section_number  section_classifier             quarter     quarter_quarter 
#>                <NA>                <NA>                <NA>                <NA> 
#>            meridian         north_coord          east_coord             sp_zone 
#>                <NA>                <NA>                <NA>                <NA> 
#>            latitude           longitude           trib_desc     location_method 
#>                <NA>                <NA>                <NA>                <NA> 
#>         source_name            moveable            has_opod           watershed 
#>                <NA>                <NA>                <NA>                <NA> 
#>              county         well_number       quad_map_name        quad_map_num 
#>                <NA>                <NA>                <NA>                <NA> 
#>    quad_map_min_ser       parcel_number diversion_site_name    last_update_date 
#>                <NA>                <NA>                <NA>                <NA> 
#> last_update_user_id        special_area              huc_12               huc_8 
#>                <NA>                <NA>                <NA>                <NA> 
#>           hu_8_name          hu_12_name               shape 
#>                <NA>                <NA>                <NA> 
#> Levels: constant aggregate identity

Created on 2022-02-07 by the reprex package (v2.0.1)

The primary error message after the rename() call is

Error in select.sf(., pod_id, appl_id) : internal error: can't find sf column

Session info: ```r R version 4.1.0 (2021-05-18) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS 12.0.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reprex_2.0.1 sf_1.0-4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 [8] tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 devtools_2.4.3 usethis_2.1.3 loaded via a namespace (and not attached): [1] httr_1.4.2 pkgload_1.2.4 jsonlite_1.7.2 modelr_0.1.8 assertthat_0.2.1 highr_0.9 [7] cellranger_1.1.0 yaml_2.2.1 remotes_2.4.2 sessioninfo_1.2.2 pillar_1.6.4 backports_1.4.0 [13] glue_1.5.1 digest_0.6.29 rvest_1.0.2 colorspace_2.0-2 htmltools_0.5.2 clipr_0.7.1 [19] pkgconfig_2.0.3 broom_0.7.10 haven_2.4.3 scales_1.1.1 processx_3.5.2 tzdb_0.2.0 [25] proxy_0.4-26 generics_0.1.1 ellipsis_0.3.2 cachem_1.0.6 withr_2.4.3 cli_3.1.0 [31] magrittr_2.0.1 crayon_1.4.2 readxl_1.3.1 memoise_2.0.1 evaluate_0.14 ps_1.6.0 [37] fs_1.5.1 fansi_0.5.0 xml2_1.3.3 class_7.3-19 pkgbuild_1.2.1 tools_4.1.0 [43] prettyunits_1.1.1 hms_1.1.1 lifecycle_1.0.1 munsell_0.5.0 callr_3.7.0 compiler_4.1.0 [49] e1071_1.7-9 rlang_0.4.12 classInt_0.4-3 units_0.7-2 grid_4.1.0 rstudioapi_0.13 [55] rmarkdown_2.11 testthat_3.1.1 gtable_0.3.0 DBI_1.1.1 R6_2.5.1 lubridate_1.8.0 [61] knitr_1.36 fastmap_1.1.0 utf8_1.2.2 rprojroot_2.0.2 KernSmooth_2.23-20 desc_1.4.0 [67] stringi_1.7.6 Rcpp_1.0.7 vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1 xfun_0.28 ``` Thanks for any insights!
edzer commented 2 years ago

It seems I can't download the file; could you share it in a different way, or share it directly with me?

lucyrandrews commented 2 years ago

Sorry about that, not sure what’s up. Here’s a link to the download page.

https://gispublic.waterboards.ca.gov/portal/home/item.html?id=746870b4aa1f47579ca1bf5d6c9924a8

edzer commented 2 years ago

I can download the zip file, but can't read the unzipped file; also doesn't work with ogrinfo.

lucyrandrews commented 2 years ago

Frustrating. Thanks for bearing with me. How about a Google Drive link?

https://drive.google.com/uc?export=download&id=10XwRQEZ4zA3TeCRerrVddP5RvA5jOZW3

Here's what I have in the folder when unzipped:

#>  [1] "a00000001.gdbindexes"                       
#>  [2] "a00000001.gdbtable"                         
#>  [3] "a00000001.gdbtablx"                         
#>  [4] "a00000001.TablesByName.atx"                 
#>  [5] "a00000002.gdbtable"                         
#>  [6] "a00000002.gdbtablx"                         
#>  [7] "a00000003.gdbindexes"                       
#>  [8] "a00000003.gdbtable"                         
#>  [9] "a00000003.gdbtablx"                         
#> [10] "a00000004.CatItemsByPhysicalName.atx"       
#> [11] "a00000004.CatItemsByType.atx"               
#> [12] "a00000004.FDO_UUID.atx"                     
#> [13] "a00000004.gdbindexes"                       
#> [14] "a00000004.gdbtable"                         
#> [15] "a00000004.gdbtablx"                         
#> [16] "a00000004.spx"                              
#> [17] "a00000005.CatItemTypesByName.atx"           
#> [18] "a00000005.CatItemTypesByParentTypeID.atx"   
#> [19] "a00000005.CatItemTypesByUUID.atx"           
#> [20] "a00000005.gdbindexes"                       
#> [21] "a00000005.gdbtable"                         
#> [22] "a00000005.gdbtablx"                         
#> [23] "a00000006.CatRelsByDestinationID.atx"       
#> [24] "a00000006.CatRelsByOriginID.atx"            
#> [25] "a00000006.CatRelsByType.atx"                
#> [26] "a00000006.FDO_UUID.atx"                     
#> [27] "a00000006.gdbindexes"                       
#> [28] "a00000006.gdbtable"                         
#> [29] "a00000006.gdbtablx"                         
#> [30] "a00000007.CatRelTypesByBackwardLabel.atx"   
#> [31] "a00000007.CatRelTypesByDestItemTypeID.atx"  
#> [32] "a00000007.CatRelTypesByForwardLabel.atx"    
#> [33] "a00000007.CatRelTypesByName.atx"            
#> [34] "a00000007.CatRelTypesByOriginItemTypeID.atx"
#> [35] "a00000007.CatRelTypesByUUID.atx"            
#> [36] "a00000007.gdbindexes"                       
#> [37] "a00000007.gdbtable"                         
#> [38] "a00000007.gdbtablx"                         
#> [39] "a00000009.gdbindexes"                       
#> [40] "a00000009.gdbtable"                         
#> [41] "a00000009.gdbtablx"                         
#> [42] "a00000009.I7COUNTY.atx"                     
#> [43] "a00000009.I7HU_12_NAME.atx"                 
#> [44] "a00000009.I7HUC_12.atx"                     
#> [45] "a00000009.I7HUC_121.HUC_12.atx"             
#> [46] "a00000009.I7HUC_121.WATERSHED.atx"          
#> [47] "a00000009.I7PARCEL_NUMBER.atx"              
#> [48] "a00000009.I7POD_ID.atx"                     
#> [49] "a00000009.I7QUAD_MAP_NAME.atx"              
#> [50] "a00000009.I7SOURCE_NAME.atx"                
#> [51] "a00000009.I7SPECIAL_AREA.atx"               
#> [52] "a00000009.I7WATERSHED.atx"                  
#> [53] "a00000009.spx"                              
#> [54] "a0000000a.gdbindexes"                       
#> [55] "a0000000a.gdbtable"                         
#> [56] "a0000000a.gdbtablx"                         
#> [57] "a0000000a.I8CORE_POD_ID.atx"                
#> [58] "a0000000a.I8FIRST_NAME.atx"                 
#> [59] "a0000000a.I8LAST_NAME.atx"                  
#> [60] "a0000000a.I8LAST_NAME2.FIRST_NAME.atx"      
#> [61] "a0000000a.I8LAST_NAME2.LAST_NAME.atx"       
#> [62] "gdb"                                        
#> [63] "timestamps"
rsbivand commented 2 years ago

Downloaded from the website manually, unzipped outside R too, the st_layers() to check the layer name.

The attr(*, "sf_column") isn't updated when the column name is changed by tolower() in rename_with():

> str(ewrims_orig)
Classes 'sf' and 'data.frame':  61084 obs. of  39 variables:
...
 $ SHAPE              :sfc_POINT of length 61084; first list element:  'XY' num  331471 -564613
 - attr(*, "sf_column")= chr "SHAPE"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:38] "POD_ID" "APPL_ID" "POD_NUM" "APPL_POD" ...
> str(ewrims_rename)
Classes 'sf' and 'data.frame':  61084 obs. of  39 variables:
...
 $ shape              :sfc_POINT of length 61084; first list element:  'XY' num  331471 -564613
 - attr(*, "sf_column")= chr "SHAPE"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:38] "POD_ID" "APPL_ID" "POD_NUM" "APPL_POD" ...

So, replacing the attribute value manually:

> attr(ewrims_rename, "sf_column") <- "shape"
> ewrims_rename %>%
+   mutate() %>%
+   select(pod_id, appl_id) 
Simple feature collection with 61084 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -373082.2 ymin: -595544.9 xmax: 538748.3 ymax: 449192.9
Projected CRS: NAD83 / California Albers
First 10 features:
   pod_id  appl_id                      shape
1   36676  C003612 POINT (331471.2 -564613.1)
2   10058 F005763S POINT (335041.1 -564137.4)
3   17489  A006445   POINT (289678.3 -563507)
4   17488  A006445 POINT (289761.5 -563290.5)
5    8766  A000487 POINT (295868.7 -563265.8)
6    7347 F010351S POINT (312301.6 -562928.3)
7       9  A011002 POINT (293252.9 -562758.4)
8   20813 F010350S POINT (326368.8 -561843.3)
9   39685  G332842 POINT (337189.5 -464985.8)
10  23953  A022539 POINT (350969.1 -561451.8)

Maybe a note for unwary users of the really unnecessary syntactic calories in rename_with(). I think that an "sf" method would be overkill (that is, never rename the geometry column automatically).

MatthieuStigler commented 1 year ago

Sorry I am confused about the outcome of this issue: is the issue supposed to be solved, or was it decided that it would be kept as is (or maybe it re-appeared)?

Using sf 1.0.9 (and 1.1.13 on a docker), I can still reproduce the original error?

Thanks!

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(tidyverse)

packageVersion("sf")
#> [1] '1.0.9'

df = st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+polling_places&filename=polling_places&format=geojson&skipfields=cartodb_id", quiet=TRUE)

df2 = df %>% rename_with(~str_replace_all(.x, "_", "-"))
df2 %>%   rename(id = objectid)
#> Error in rename.sf(., id = objectid): internal error: can't find `agr` columns

Created on 2023-04-18 with reprex v2.0.2

MatthieuStigler commented 1 year ago

The problem does not seem to arise only with rename_with, but also whenever trying to change the colnames, either with colnames or setNames:

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(tidyverse)

packageVersion("sf")
#> [1] '1.0.9'

df = st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+polling_places&filename=polling_places&format=geojson&skipfields=cartodb_id", quiet=TRUE)

df2 <- df
colnames(df2) <- str_replace_all(colnames(df), "_", "-")
df2 %>%   rename(id = objectid)
#> Error in rename.sf(., id = objectid): internal error: can't find `agr` columns

df %>% setNames(str_replace_all(colnames(.), "_", "-")) %>%   rename(id = objectid)
#> Error in rename.sf(., id = objectid): internal error: can't find `agr` columns

Created on 2023-04-19 with reprex v2.0.2

edzer commented 1 year ago

Yes, that is well known - colnames<-, setNames() or names<- are not generic, so we cannot add an sf methods for it.