hypertidy / anglr

Mesh creation and topology for spatial data (and not just geographic)
https://hypertidy.github.io/anglr/
83 stars 10 forks source link

dupe vertices in shapefile #7

Closed mdsumner closed 4 years ago

mdsumner commented 8 years ago

These files choke at pslg:

SSRUs_East_Ant_Polys.shp

cst10_polygon.shp

Is the dedupe failing?

mdsumner commented 8 years ago

It's the use of dplyr::distinct, this doesn't pick up differences in numbers like

" 2188902.156298386" " 2188902.156298387"

dplyr::distinct() says that is duplicate, but factor(as.character(.)) sees it's different

Yet (presumably these are artificially different after formatting).

library(dplyr)

d <- tibble(x = c(2188902.156298386, 2188902.156298387))

d %>% distinct()

Those are from map_table(shapefile("cst10_polygon.shp"))$v[c(2302, 10986), ]

Need to use the gris brute force method (or something better) in spbabel::map_table:

normVerts <- function(v, nam) {
  v$.vx0 <- as.integer(factor(do.call("paste", c(v[,nam], sep = "\r"))))
  bXv <- v %>% dplyr::select(.vx0, .br0, .br_order)
  v <- v %>% distinct(.vx0) 
  list(v = v, bXv = bXv)
}
mdsumner commented 8 years ago

This is for spbabel

mpadge commented 6 years ago

Just dumping thoughts in old issues here for ya, but I'm churning through the silicate code (which is soooo awesome, btw!), and have encountered one issue that is the precise obverse of this.

> library (silicate)
> library (dodgr)
> weight_streetnet (hampi) %>% dodgr_vertices () %>% nrow () # hampi = dodgr test data
# [1] 2861
> SC(hampi)$vertex %>% nrow ()
# [1] 2860

That's because the coordinates are identical, so SC de-duplicates them, but the are not OSM identical. This often arises in vertical cases such as pedestrian ways crossing over or under ground-borne ways, but is an entirely general condition. Sometimes points where lines cross should not be considered as junctions, which silicate can't currently handle. My easy solution will be to implement an option to allow all silicate capital functions (SC, ARC, PATH, ...) to optionally specify a user-defined ID column to allow (otherwise apparent) duplication of vertices, edges, objects, whatever, as long as they have got separate IDs.

That should go some way to resolving your concerns expressed in unjoin

mdsumner commented 6 years ago

I had to do this in anglr to support copy_down properly, and denormalizing (from unique x/y) takes a different approach for SEQuential models (ARC, PATH) than PRIMitive models (SC, TRI, DEL)

https://github.com/hypertidy/anglr/blob/master/R/de-normalize.R#L11

The definition of geometric uniqueness should always be optional and available, though in practice I've only had to work with x/y or x/y/z or x/y/z/t so far. This all needs a lot more attention!

mdsumner commented 4 years ago

Still a problem, but I have not seen this for a long time

## data_local/vector_cache/cst10_polygon.Rdata
d <- anglr::DEL0(cst10_polygon)
Internal error in segmentintersection():
  Topological inconsistency after splitting a segment.
  Please report this bug to jrs@cs.berkeley.edu
  Include the message above, your input data set, and the exact
    command line you used to run Triangle.
Error in (function (p, a = NULL, q = NULL, Y = FALSE, j = FALSE, D = FALSE,  : 
  Triangle exit, code $i
mdsumner commented 4 years ago

Buffer = 0 doesn't fix it

mdsumner commented 4 years ago

Every polygon works fine in DEL0 when done in a loop separately. So this is a "mesh on entire set" problem.

And, as @mpadge says it's related to overlapping segments that aren't meant to be meshed together :) but here it's for different reasons (crappy digitization of a polygon). I think CGAL's robustness might allow this to work, but really it is crappy polygons.

This is enough to trigger

DEL0(cst10_polygon[c(131, 363), ])
Internal error in segmentintersection():
  Topological inconsistency after splitting a segment.
  Please report this bug to jrs@cs.berkeley.edu
  Include the message above, your input data set, and the exact
    command line you used to run Triangle.
Error in (function (p, a = NULL, q = NULL, Y = FALSE, j = FALSE, D = FALSE,  : 
  Triangle exit, code $i

TRI0 is fine with it

TRI0(cst10_polygon[c(131, 363), ])
class       : TRI0
type        : Primitive
vertices    : 8149 (2-space)
primitives  : 8187 (2-space)
crs         : +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
 plot(TRI0(cst10_polygon[c(131, 363), ]))
 plot(cst10_polygon[131, ], add = TRUE, col = "firebrick")

image

cst10_polygon.zip

mdsumner commented 4 years ago

Somewhere along this boundary

image

might just be this non-noded intersection

image

maybe

DEL0(rgeos::gNode(as(cst10_polygon[c(131, 363), ], "SpatialLines")))
class       : DEL0
type        : Primitive
vertices    : 8098 (2-space)
primitives  : 5883 (2-space)
crs         : +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0

that's not enough to triangulate it though, it doesn't work well because gNode only gives one feature

and this doesn't work, though the resulting POLYGONs from sf do look ok

DEL0(sf::st_cast(sf::st_polygonize(sf::st_node(sf::st_cast(sf::st_as_sf(cst10_polygon[c(131, 363), ]), "MULTILINESTRING")))))
Internal error in segmentintersection():
  Topological inconsistency after splitting a segment.

something to compare with CGAL, but it might be worth catching this error, and churning through TRI0 instead?

mdsumner commented 4 years ago

this works

plot(DEL0(sf::st_as_sfc(sf::st_as_binary(sf::st_set_precision(sf::st_as_sf(cst10_polygon[c(131, 363), ]), 1)$geometry))))

image

and preserves vertices it seems

DEL0(sf::st_as_sfc(sf::st_as_binary(sf::st_set_precision(sf::st_as_sf(cst10_polygon[c(131, 363), ]),.1)$geometry)))
class       : DEL0
type        : Primitive
vertices    : 8048 (2-space)
primitives  : 8191 (2-space)
crs         : NA
> SC0(cst10_polygon[c(131, 363), ])
class       : SC0
type        : Structural
vertices    : 8149 (2-space)
primitives  : 8159 (2-space)
crs         : +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0

up to 1e9

DEL0(sf::st_as_sfc(sf::st_as_binary(sf::st_set_precision(sf::st_as_sf(cst10_polygon[c(131, 363), ]),1e9)$geometry)))
class       : DEL0
type        : Primitive
vertices    : 8223 (2-space)
primitives  : 8563 (2-space)
crs         : NA
> DEL0(sf::st_as_sfc(sf::st_as_binary(sf::st_set_precision(sf::st_as_sf(cst10_polygon[c(131, 363), ]),1e10)$geometry)))
Internal error in segmentintersection():
mdsumner commented 4 years ago

It's time I finally learnt this precision stuff enough, and seems relevant for @dcooley ?👍

I'll put this data set in anglr while I explore it a bit more (so don't bother with the details of this issue above ).

signif is enough (sf's precision is pretty counter intuitive I think)

cst10 <- readRDS(system.file("extdata/cst10_polygon.rds", package = "anglr", mustWork = TRUE))
library(silicate)
p <- PATH0(cst10)
# DEL0(p)  ## fails
p$vertex$x_ <- signif(p$vertex$x_, 10)
p$vertex$y_ <- signif(p$vertex$y_, 10)
DEL0(p) ## works

10 is about 1mm in this metres projection - ultimately that's the issue I think, coordinates in metres with spurious precision , and this is something that should happen at the "find unique" step, at least be an option to control - don't know if there's any tools for this in R already, or if signif(x, 10) will be enough for normal use?

DEL0(p, max_area = 1e10)
class       : DEL0
type        : Primitive
vertices    : 15731 (2-space)
primitives  : 23893 (2-space)
crs         : +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
mesh_plot(DEL0(p, max_area = 1e10), col = viridis::viridis(23893))

image

mdsumner commented 4 years ago

Oh @mpadge the "not OSM identical" thing is in "rgl::as.mesh3d" with its notEqual arg, though motivated differently. Another similar case is keeping distinct in xyz, which quadmesh::dquadmesh does fwiw

mdsumner commented 4 years ago

oh k, I understand better what sf's precision is doing now and how it's much more than rounding - and also that some thoughts I had about using this for detail reduction were pretty off-rail, but at any rate here's a wrapper for sf's precision setter to round-trip through binary and rebuild:

osf <- function(x, precision = NULL) {
  if (is.null(precision)) {
    return(x)
  }
  x <- sf::st_set_precision(x, precision)
  sf::st_set_geometry(x, sf::st_as_sfc(sf::st_as_binary(sf::st_geometry(x))))
}

Here's a wrapper to re-map PATH0 indexes after rounding the vertex table coordinates (works in very limited ways to avoid the problem of uniqueness vs. spurious-precision):

renorm <- function(x, ...) {
  ## only PATH0 for now
  vv <- sc_vertex(x)
  vv$vertex_ <- 1:nrow(vv)
  gi <- dplyr::group_indices(
    dplyr::group_by(vv, x_, y_)
  )
  fun <- function(xx) {
    xx$vertex_ <- vv$vertex_u[xx$vertex_]
    xx
  }
  if (dim(vv)[1L] > length(unique(gi))) {
   # print("jiggy")
    vv$vertex_u <- vv$vertex_[gi]

    x$object$path_ <-  lapply(x$object$path_, fun)
    #vv <- dplyr::distinct(vv, vertex_u, .keep_all = TRUE)
    vv <- vv[!duplicated(vv$vertex_u), ]
    vv$vertex_u <- NULL
    vv$vertex_ <- NULL
    x$vertex <- vv
  }

  x
} 

I also explored snap-to-grid, which also works up to a point - what I think might be helpful for DEL/DEL0 is on this error to suggest using these tools - rebuild with sf precision - do a rounding or grid-snap of sc$vertex - and have a bit of a guide to doing this.

mpadge commented 4 years ago

All very interesting, and also stuff that I need right now as well (for a dodgr issue with coordinate identical-ness). One issue that i'm having is that signif is really slow:

library (inline)
library (rbenchmark)
x <- 100 * rnorm (1e7)
# function to round to integer value of (x * precision)
cround <- cfunction(c(x = "numeric", precision = "integer"), "
                 const int n = length (x);
                 double *rx;
                 rx = REAL (x);

                 int *rprecision;
                 rprecision = INTEGER (precision);

                 SEXP result = PROTECT (allocVector (INTSXP, n));
                 int *rout;
                 rout = INTEGER (result);

                 for (int i = 0; i < n; i++)
                     rout [i] = rx [i] * rprecision [0];
                 UNPROTECT (1);

                 return result;
                 ")
# r-requivalent:
rround <- function (x, precision)
    round (x * precision)
precision = as.integer (1e8)
benchmark (
    "signif" = s <- signif (x, digits = 10),
    "round" = s <- round (x, digits = 10),
    "cround as int" = s <- cround (x, precision = precision),
    "cround as num" = s <- cround (x, precision = precision) / precision,
    "rround" = s <- rround (x, precision),
    replications = 1,
    order = "relative",
    columns = c ("test", "elapsed", "relative"))
#>            test elapsed relative
#> 3 cround as int   0.022    1.000
#> 4 cround as num   0.070    3.182
#> 5        rround   0.146    6.636
#> 1        signif   0.474   21.545
#> 2         round   1.000   45.455

Created on 2020-05-15 by the reprex package (v0.3.0)

... but a C function is no solution, because it is bounded by .Machine$integer.max, which is generally O(1e9), so the precision value for this C function is limited to around 1e8, and definitely not 1e10 (and that's actually more like 8 digits in total, so precision much less for other CRSs with larger integer components). Any thoughts on that? I will likely implement a C-type routine for dodgr, because it's just too slow otherwise, but anglr will likely need better precision, right?

... on the other hand, this will likely change a lot with C++20, and extensions of to_chars. I've tried wrapping these, but the return value is essentially a std::string_view object, which does not play nicely with std::string, which is the simplest thing to Rcpp::wrap. Solving this would need a custom interface between C++17 and the C-level CHARSXP stuff.

@dcooley Have you tried playing with std::to_chars() perchance?

mdsumner commented 4 years ago

I just always thought that Rmpfr or CGAL would be required for the hard-triangulation-robustness stuff - I did not ever realize that rounding per se was as current.

FWIW the stuff I will cross-reference as I follow along:

https://github.com/danshapero/predicates

https://observablehq.com/@mourner/non-robust-arithmetic-as-art

dcooley commented 4 years ago

Have you tried playing with std::to_chars() perchance?

No, I've tried to stay within C++11 requirements (sometimes to C++14), so haven't used this.

I've been going through how data.table do their comparisons to see if I can adapt their code. But haven't got far with it.