r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 297 forks source link

How to go from MULTIPOINT to just point? #114

Closed Robinlovelace closed 7 years ago

Robinlovelace commented 7 years ago

Reproducible example:

x = st_line_sample(st_transform(ls, 3857), density = c(1/1000, 1/10000)) # one per km, one per 10 km
class(x)
## [1] "sfc_MULTIPOINT" "sfc"   

Is there a function, e.g. as.st_points that could convert this into just points? If not any ideas of the best way to go about creating such a function?

Robinlovelace commented 7 years ago

Update - works (with warning) for sp objects:

y = as(x, "Spatial")
plot(y)
SpatialPoints(y)
class       : SpatialPoints 
features    : 112 
extent      : 0, 5565.975, -7.081155e-10, 110823.7  (xmin, xmax, ymin, ymax)
coord. ref. : NA 
Warning message:
In validityMethod(object) :
  duplicate rownames are interpreted by rgeos as MultiPoints; use SpatialMultiPoints to define these; in future sp versions this warning will become an error
edzer commented 7 years ago

could sfc_MULTIPOINT -> sfc_POINT be part of st_cast? We now see

> st_cast(st_sfc(st_multipoint(matrix(1:6,,2))), "POINT")
Geometry set for 1 feature 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 1 ymin: 4 xmax: 3 ymax: 6
epsg (SRID):    NA
proj4string:    NA
POINT(1 4) 
Warning message:
In chk(x, "MULTIPOINT") :
  object does not have length one: casting causes information loss

but maybe one could return all points? Or an sf object with an integer attribute denoting the original ID (nr) in the sfc_MULTIPOINT?

Robinlovelace commented 7 years ago

I think returning all points would be good - I imagine the saved ID could be useful in some instances. Not sure if that reply is useful...

I found a way to stop the warning in sp btw - just remove the repeated matrix row names:

https://github.com/ropensci/stplanr/pull/164/commits/52734f77395960c0e67a75f8919ee996b861e887#diff-c451334402a5623377b6bfc78472163dR73

Robinlovelace commented 7 years ago

Note also: I think I found a bug in sp: spTransform() seems to fail on multipoint objects...

mdsumner commented 7 years ago

I think this is not a job for st_cast but a new function, because of the implied 1-to-many aspect (and more generally perhaps the many-to-1 possibilties within aggregate operations).

I expect st_cast to work within a mutate() call for example, which would be precluded if st_cast was exploding multis to single rather than "dropping".

mdsumner commented 7 years ago

I would go about it something like this, though there's probably more concise ways. This is all standard R data.frame/list idioms, I'm exploring similar techniques for the general cast case (without changing row numbers at the top level).

The following assumes there's no mixing, the input is all MULTIPOINT.


## mk multipoint worker
mkmpt <- function(n, mean = 0) st_multipoint(matrix(rnorm(n * 2, mean), ncol = 2))
## mk sf worker
mksf <- function(d, gc) {
  stopifnot(nrow(d) == length(gc))
  d[["geometry"]] <- gc
 st_as_sf(d)
}
## drop geom
drop_geom <- function(x) {
  as.data.frame(x)[-match(attr(x, "sf_column"), names(x))]
}

## create a data set
mp <- mksf(data.frame(a = c("a", "b"), b = 1:2), st_sfc(list(mkmpt(10), mkmpt(20, 5))))

## convert multipoint to point
## simple case (it's all multipoint)
mp_to_pt <- function(mp) {
  ## build map of grouping in geometry
 idx <- unlist(lapply(st_geometry(mp), nrow))
 ## initialize replacement object
 ## expand out (by duplicating rows for every individual coordinate)
 d <- drop_geom(mp)[rep(seq_len(nrow(mp)), idx),  ]
 rownames(d) <- NULL
 ## split/explode the multipoints and rebuild 
 d[["geometry"]] <- st_sfc(unlist(lapply(st_geometry(mp), function(a) lapply(split(a, seq_len(nrow(a))), st_point)), recursive = FALSE, use.names = FALSE), 
                          crs = st_crs(mp))

 st_as_sf(d)
}

## do the conversion, MULTIPOINT sf to POINT sf with auto-one-to-many expansion
mp_to_pt(mp)

I think there's a nice case for a general decomposition to rows-per part, but it's got complications - like holes being elevated to islands, so I'm not sure how much of this will belong in sf directly.

edzer commented 7 years ago

How about an st_un_multi for all MULTI* objects? It could also unfold GEOMETRYCOLLECTIONS one level down. Or st_unwrap, to take a verb?

Robinlovelace commented 7 years ago

Both sound good, but I like the one with multi in it better as it makes the link to MULTI* features more clearly. I think st_unmulti is another contender but less clear than your st_un_multi suggestion.

edzer commented 7 years ago

Here's some draft code that un-multipoints:

st_un_multipoint = function(x) {
    g = st_geometry(x)
    i = rep(seq_len(nrow(x)), sapply(g, nrow))
    x = x[i,]
    st_geometry(x) = st_sfc(do.call(c,
        lapply(g, function(geom) lapply(1:nrow(geom), function(i) st_point(geom[i,])))))
    x$original_geom_id = i
    x
}

# example:
library(sf)
g = st_sfc(st_multipoint(matrix(1:6,,2)),
    st_multipoint(matrix(1:4,,2)),
    st_multipoint(matrix(11:18,,2)))
df = data.frame(a = 1:3, b = 5:7)
st_geometry(df) = g
df

st_un_multipoint(df)
etiennebr commented 7 years ago

I think this is equivalent to st_points in postgis

edzer commented 7 years ago

We now have:

> st_cast(x, "POINT")
Geometry set for 112 features 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 0 ymin: -7.081155e-10 xmax: 5565.975 ymax: 110823.7
epsg (SRID):    3857
proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
First 5 geometries:
POINT(0 501.464607505533)
POINT(0 1504.39382251802)
POINT(0 2507.3230375305)
POINT(0 3510.25225254298)
POINT(0 4513.18146755547)
> ?st_cast
> st_cast(x, "POINT", group_or_split = FALSE)
Geometry set for 2 features 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 0 ymin: -7.081155e-10 xmax: 5565.975 ymax: 501.4646
epsg (SRID):    3857
proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
POINT(0 501.464607505533)
POINT(5565.97453966368 -7.08115455161362e-10)
Warning messages:
1: In st_cast.MULTIPOINT(X[[i]], ...) : point from first coordinate only
2: In st_cast.MULTIPOINT(X[[i]], ...) : point from first coordinate only

As it is now, st_cast by default does the one-to-many, and even (if you specify ids) many-to-one with an aggregate on the attributes. I will move the latter to a proper aggregate method, but don't see how st_point catches all the one-to-many casts -- I suggest to leave that in.

Robinlovelace commented 7 years ago

Update: this is still not working:

# test multipoint to point conversion
x = st_multipoint(matrix(1:10, ncol = 2))
st_cast(x = x, to = "POINT")
POINT(1 6)
Warning message:
In st_cast.MULTIPOINT(x = x, to = "POINT") :
  point from first coordinate only
Robinlovelace commented 7 years ago

I was expecting 5 points, that can get ids associated. Equivalent sp code:

p = sp::SpatialPoints(matrix(1:10, ncol = 2))

How does one do that in sf?

mdsumner commented 7 years ago

From scratch:

d <- as.data.frame(matrix(1:10, ncol = 2))
d$id <- 1:5
st_as_sf(d, coords = c("V1", "V2"))

But the casting method also works if it's sfc:

st_cast(st_sfc(x), "POINT")

Compare when the group_or_split is FALSE (this argument is not available to all st_cast methods):

st_cast(st_sfc(x), "POINT", group_or_split = FALSE)
Robinlovelace commented 7 years ago

Thanks for the clarification - I missed out the st_sfc part - gradually making sense - seems like a logical system. I'd say this issue is closed then, although not sure about other feature types such as POLYGON. For my use-case this issue is closed.

Robinlovelace commented 7 years ago

Question on this @edzer: how to create a POINT sf with 5 features based on this:

library(sf)
set.seed(1985)
m3 = matrix(runif(10), ncol = 2)
mp = st_multipoint(x = m3)
p = st_cast(x = mp, to = "POINT")

At the moment I'm getting this useful warning but no indication of how to make it include all points:

#> Warning in st_cast.MULTIPOINT(x = mp, to = "POINT"): point from first
edzer commented 7 years ago

You can't create a POINT that includes more than one point. You can create a list of points, though:

p = st_cast(x = st_sfc(mp), to = "POINT")
Robinlovelace commented 7 years ago

Thanks - makes sense. This if for some documentation of the new aggregate method you've created, which looks awesome. PR incoming!

mbedward commented 7 years ago

Is there a PR, or one in the works, for st_un_multipoint?

tim-salabim commented 7 years ago

Is st_cast(mp, "POINT") not enough?

mbedward commented 7 years ago

My bad - I thought from first reading the discussion that there was a problem with using casting for this case.