r-spatial / s2

Spherical Geometry Operators Using the S2 Geometry Library
https://r-spatial.github.io/s2/
71 stars 10 forks source link

Converting s2_geography using supported methods is not guaranteed to get back the original data #260

Open tzakharko opened 1 month ago

tzakharko commented 1 month ago

I am trying to work around the lack of serialization support (#259) by converting s2 geography data into other formats (such as using simple features or wkb), but the data can get corrupted under certain circumstances. That is, doing s2_as_geography(st_as_sfc(g)) or s2_as_geography(s2_as_binary(g)) is not guaranteed to recover the original data.

Consider the following example from real-world code:

g <- geodesic_grid$geography
all(s2_is_valid(g))
#> [1] TRUE

g1 <- as_s2_geography(s2_as_binary(g), check = FALSE)
all(s2_is_valid(g1))
#> [1] FALSE

g2 <- as_s2_geography(sf::st_as_sfc(g), check = FALSE)
all(s2_is_valid(g2))
#> [1] FALSE

Unfortunately, I cannot provide an MRE because I do not know how to accurately persist s2 geography data. In the cases I am dealing with, it seems to be a precision issue when converting the data between various formats. The affected geographies have some points that are very close by, and it appears that these are detected as duplicates when the data is converted back to s2. While I can fix this by running s2_simplify(), it would be good to have a way to persist the data as is.

paleolimbot commented 1 month ago

Thanks for opening this!

We store s2 geometries as pointers to C++ structures for performance reasons. This is usually OK since s2 is most commonly used as an intermediary format (usually the start and end is an sfc). This is usually OK but, as you've found out, it is not possible to serialize it using serialize().

The issue you're seeing here is I think because S2's internal representation is a unit vector (e.g., 3 doubles), whereas the WKB export and import by default take the more ubiquitous longitude/latitude. The conversion between those is slightly lossy and looses a tiny bit of precision (in your case enough to trigguer a difference!).

It is possible to override this projection but it's not obvious how to do this. I've pasted an example below which may help!

library(s2)
set.seed(1234)

lng <- c(-64, 15) + runif(2)
lat <- c(45, 52) + runif(2)

(geog <- s2::s2_make_line(lng, lat))
#> <geodesic s2_geography[1] with CRS=OGC:CRS84>
#> [1] LINESTRING (-63.8862966 45.6092747, 15.6222994 52.6233794)

(wkb_storage <- wk::wk_handle(geog, wk::wkb_writer(), s2_projection = NULL))
#> <wk_wkb[1]>
#> [1] <LINESTRING Z (0.3079087 -0.6281395 0.7145859, 0.5846258 0.1634758 0.7946624)>

serialized <- serialize(wkb_storage, NULL)
geog2 <- wk::wk_handle(
  unserialize(serialized),
  s2::s2_geography_writer(oriented = TRUE, projection = NULL)
)

geog2
#> <geodesic s2_geography[1] with CRS=OGC:CRS84>
#> [1] LINESTRING (-63.8862966 45.6092747, 15.6222994 52.6233794)

s2_equals(geog, geog2)
#> [1] TRUE

Created on 2024-07-12 with reprex v2.1.0