Closed etiennebr closed 7 years ago
I see, this is taken care of in summarise_.sf
.
If you
nc %>% as_tibble %>% st_as_sf %>% class()
# [1] "sf" "tbl_df" "tbl" "data.frame"
you get an object of class sf
, tbl_df
etc. Is that a good idea? Does that help for anything?
Seems to be related to the geometries.
Works :
library(sf)
library(tibble)
library(dplyr)
library(magrittr)
nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
nc %>%
as_tibble %>%
filter(SID79 < 10) %>%
group_by(SID79) %>%
summarise(geog = st_union(geometry)) %>%
mutate(area = st_area(geog))
# # A tibble: 10 × 3
# SID79 geog area
# <dbl> <sf> <S3: units>
# 1 0 <MULTIPOLYGON...> 8473635999 m^2
# 2 1 <MULTIPOLYGON...> 7871653802 m^2
# 3 2 <MULTIPOLYGON...> 9608907418 m^2
# 4 3 <MULTIPOLYGON...> 8136026436 m^2
# 5 4 <MULTIPOLYGON...> 11879964776 m^2
# 6 5 <MULTIPOLYGON...> 13623569968 m^2
# 7 6 <MULTIPOLYGON...> 6992413837 m^2
# 8 7 <MULTIPOLYGON...> 8542520112 m^2
# 9 8 <MULTIPOLYGON...> 8000734201 m^2
# 10 9 <MULTIPOLYGON...> 4615845685 m^2
But,
nc %>%
as_tibble %>%
filter(SID79 < 11) %>%
group_by(SID79) %>%
summarise(geog = st_union(geography)) %>%
mutate(area = st_area(geog))
# Error in mutate_impl(.data, dots) :
# unable to find an inherited method for function ‘coordinates’ for signature ‘"numeric"’
I don't understand the traceback()
of that.
Ah! I think it has to do with POLYGON
vs MULTIPOLYGON
.
e.g. : SID79:1 is MULTIPOLYGON (9 features), but SID79:10 is POLYGON (1 feature)
nc %>%
as_tibble %>%
filter(SID79 %in% c(1, 10)) %>%
group_by(SID79) %>%
summarise(geog = st_union(geography)) %>%
mutate(area = st_area(geog))
# Error in CPL_get_bbox(obj, 3) : not a matrix
SID79:12 also has a single feature
nc %>%
as_tibble %>%
filter(SID79 %in% c(10, 12)) %>%
group_by(SID79) %>%
summarise(geog = st_union(geography)) %>%
mutate(area = st_area(geog))
# # A tibble: 2 × 3
# SID79 geog area
# <dbl> <sf> <S3: units>
# 1 10 <POLYGON((-78...> 1550981080 m^2
# 2 12 <POLYGON((-79...> 2019778925 m^2
x <- nc %>%
as_tibble %>%
filter(SID79 %in% c(10, 11)) %>%
group_by(SID79) %>%
summarise(geog = st_union(geography))
str(unclass(x$geog))
# List of 2
# $ :List of 1
# ..$ : num [1:22, 1:2] -78.6 -78.7 -78.8 -78.9 -78.9 ...
# ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
# $ :List of 2
# ..$ :List of 1
# .. ..$ : num [1:45, 1:2] -77.5 -77.5 -77.5 -77.5 -77.5 ...
# ..$ :List of 1
# .. ..$ : num [1:10, 1:2] -79.2 -79.2 -79.5 -79.5 -79.5 ...
# ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
# - attr(*, "n_empty")= int 0
# - attr(*, "precision")= num 0
# - attr(*, "bbox")= Named num [1:4] -79.2 35.2 -78.5 35.6
# ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"
# - attr(*, "crs")=List of 2
# ..$ epsg : int 4267
# ..$ proj4string: chr "+proj=longlat +datum=NAD27 +no_defs"
# ..- attr(*, "class")= chr "crs"
Even more surprising
mutate(x, area=st_area(geog))
# Error in CPL_get_bbox(obj, 2) : not compatible with requested type
But
st_area(x$geog)
# Units: m^2
# [1] 1550981080 575180854
I'm getting deeper in sf
, leaving breadcrumbs in case @edzer you see I'm missing something obvious.
The error happens when printing the sfc
. Traceback:
17. structure(list(message = "not compatible with requested type",
call = CPL_get_bbox(obj, 2), cppstack = NULL), .Names = c("message",
"call", "cppstack"), class = c("Rcpp::not_compatible", "C++Error",
"error", "condition")))
16. get_bbox(obj, 2)
15. CPL_get_bbox(obj, 2), names = c("xmin", "ymin", "xmax",
"ymax")) at bbox.R#20
14. bbox.sfc_POLYGON(x) at bbox.R#39
13. bbox(x) at sfc.R#107
12. sfc`(X[[i]], ...)
11. FUN(X[[i]], ...)
10. lapply(x, `[`, i)
So this works fine (dropping the geog column)
transmute(x, area=st_area(geog))
# # A tibble: 2 × 1
# area
# <S3: units>
# 1 1550981080 m^2
# 2 575180854 m^2
Also using your suggestion @edzer : x %>% st_as_sf()
prints fine
There's something happening in summarise
, after it evaluates all the arguments. The correct class for POLYGON
and MULTIPOLYGON
is GEOMETRY
, but when the st_sfc conversion happens within summarise
it thinks it's MULTIPOLYGON
.
# ok
nc %>%
as_tibble %>%
group_by(SID79) %>%
summarise(geog = st_union(geography)) %>%
mutate(geog = st_sfc(geog)) %$%
class(geog)
# [1] "sfc_GEOMETRY" "sfc"
# not ok
nc %>%
as_tibble %>%
group_by(SID79) %>%
summarise(geog = st_sfc(st_union(geography))) %$%
class(geog)
# [1] "sfc_MULTIPOLYGON" "sfc"
Same thing if st_sfc is in a separate argument, or if it's not a tibble.
nc %>%
as_tibble %>%
group_by(SID79) %>%
summarise(geog = st_union(geography),
geog = st_sfc(geog)) %$%
class(geog)
# [1] "sfc_MULTIPOLYGON" "sfc"
nc %>%
group_by(SID79) %>%
summarise(geog = st_union(geometry)) %$%
class(geog)
# [1] "sfc_MULTIPOLYGON" "sfc"
Great progress. st_sfc
tries to set the class of the collection of geometries right, but it seems that summarize
calls st_sfc
for each group, and then binds the groups together without calling st_sfc
on the total; mutate
does this, as it operates on the whole list-column. Can we find out how summarize
binds groups, and interfere there?
An alternative, more hacky approach would be to make summarise_.sf
always the "right" class, instead of the simplest class that st_union
apparently returns; here st_cast
might come in handy, @mdsumner ? Of course this would only work if summarise_.sf
is being called; after piping through as_tibble
this is no longer the case? Another reason to maintain the sf
class attribute?
I think there are two issues. First is, I didn't expect summarise to attempt to take care of the geometry merge with st_union, which created two geometries. Once this is clear, it would be possible to allow overriding st_union, but I believe it's a very good default.
nc %>%
group_by(SID79) %>%
summarise(geog = st_union(geometry))
# Simple feature collection with 28 features and 1 field
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -83.98855 ymin: 34.98907 xmax: -75.7637 ymax: 36.58965
# epsg (SRID): 4267
# proj4string: +proj=longlat +datum=NAD27 +no_defs
# SID79 geog
# 1 0 MULTIPOLYGON(((-83.93799591...
# 2 1 MULTIPOLYGON(((-84.29103851...
# 3 2 MULTIPOLYGON(((-77.04900360...
# 4 3 MULTIPOLYGON(((-78.02592468...
# ...
# 11 10 POLYGON((-78.6127395629883 ...
# 12 11 MULTIPOLYGON(((-77.47388458...
# 13 12 POLYGON((-79.7649917602539 ...
# 14 13 POLYGON((-78.5387420654297 ...
# 15 14 POLYGON((-77.8365783691406 ...
# 16 15 POLYGON((-81.8162841796875 ...
# 17 16 POLYGON((-79.4559707641602 ...
# Warning message:
# In st_sf(x, ..., relation_to_geometry = relation_to_geometry) :
# more than one geometry column: ignoring all but first
The other issue seems to be that the geometry type ends up being MULTIPOLYGON
, but should be GEOMETRY
(in this specific case). I think that's the real problem that has multiple symptoms.
To answer your question on sf
. This is a larger topic, but it's a good question to ask, might even require to open another issue. I've been thinking about sf
class attributes, and before leaving for the holidays I wanted to write a few thoughts. I think it is very convenient and makes it more similar to sp
. Basically, it makes it straightforward to manipulate a data.frame
without having to care for the spatial attributes all the time.
I don't see it as a bad thing to have the user explicitly care for the spatial attributes. It might even make it easier in the long run to have the user understand what is going on and explicitly ask for it.
On the development side, I think it offers additional control (like it would in this case), but also put the burden of having good defaults that always end up frustrating half of the users :)
It could be easier to develop and interact with other packages to focus on sfc
than sf
functionnality. For instance I can envision geom_sf()
(#88) operating on a sfc
rather than a sf
.
nc %>%
ggplot(aes(sf=geom, fill=SID79)) +
geom_sf() # or some other name :)
However while I've dug deep in some parts of sf
, I don't feel like I have yet a broad understanding of the package (lots of stuff there !), so maybe I miss a piece of the picture.
I'm not sure about a new name for the geometry column after a groupby/summarize, I think that means summarise.sf must check at the end of evaluation and update sf_column to an existing sfc? I feel I don't have enough experience with S3, or with extensions to dataframes generally.
But, I think it's clear that class(as_tibble(sf)) should be c("tbl_df", "tbl", "sf", "data.frame")
, i.e. respecting what was passed in?
In that case, sf
needs to provide as_tibble.sf
, which it doesn't, right now.
Hi, I am having a problem that I think is related to this trend. I have a huge set of polygons (birds dataset) that I need to do an union for some of them in the dataset (birds polygons with same id_no). I have been able to do the union of some these polygons but other polygons seems to be crashing the process (I started to do data subsets with data that works). I think the crashes are happening because some polygons have distinct geometry type in the dataset. I have been getting this error and warning message when I have some of these polygons in the sf dataset:
Error: Problem with summarise()
input geometry
.
x Evaluation error: ParseException: Unknown WKB type 12.
i Input geometry
is sf::st_union(geometry)
.
i The error occurred in group 438: id_no = "22709319".
Run rlang::last_error()
to see where the error occurred.
In addition: Warning message:
In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
This is the code I have been using to do the union:
bird_union_sf <- bird_sf %>% group_by(id_no) %>% summarise(geometry = sf::st_union(geometry)) %>% ungroup()
I read on other blogs from you that st_cast might be a solution, but I dont know how to apply it in my code. Any hints would be totally appreciated. Pd. This data set comes from a gdbtable file
Thanks for reporting this issue @Randall-HYLA. I suggest opening a new issue and link to this (closed) issue. Also, please post a reproducible minimal example.
The error message points toward at an invalid geometry (WKB type 12
) in group 438 id_no = "22709319"
. Maybe this is the data you could share with your reprex. Since the error states that the bounding box has potentially an invalid value range for longlat data
, I suspect you might have data that wraps around the ±180° line, or maybe not long+lat data.
Thank you for your suggestions. @etiennebr, I was able to solve the problem at the end. The problem had to do with the dataset. I had Multisurface geometries and Multipolygon geometries on the same sf object. I ended up converting all the multisurface geometries to Multipolygon with st_cast() and them validating those. The validation part is TOTALLY necessary (topological correction), and depends on the version of sf and the linked GDAL and GEOS software. Here is the code I used to fix it before running the union:
bird_sf <- st_cast(bird_sf, "MULTIPOLYGON")
bird_sf <- st_make_valid(bird_sf)
Pd: the validation part takes some time.
st_union()
seem to work fine, but insummarise()
it creates a weird geometry that it can't read. I tried to dump the geometry to wkt (quite large, so I won't put it here), and it can display on wicket, but vertex order is slightly wrong. I think it might be how it handled the multi parts polygons.But it fails with summarise
To dump the geometry: