hypertidy / silicate

A general form for complex data
https://hypertidy.github.io/silicate/
53 stars 4 forks source link

points not working #59

Closed mdsumner closed 6 years ago

mdsumner commented 6 years ago

The PATH sleight of hand seems a bit funny now with SC, somehow we need this to work because anglr-do-all() needs to go.

  1. The paths should really be degenerate, but that's not happening
  2. Do we need a POINT model? It could derive from the others And just warn/error when other models used?
rgl::rgl.clear()

library(anglr)

mpts <- as(as(simpleworld, "SpatialLinesDataFrame"), "SpatialMultiPointsDataFrame")
plot3d(PATH(mpts))
rgl::view3d(theta = 25, phi = 3)
rgl::rglwidget()
mdsumner commented 6 years ago

this is really a problem with linking objects to edges to vertices when there's no edges, only points

mdsumner commented 6 years ago

No

class(st_cast(st_cast(minimal_mesh, "MULTILINESTRING"), "MULTIPOINT")$geometry[[1]])
[1] "list"       "MULTIPOINT" "sfg" 

yet

(p1 = st_point(c(1,2)))
class(p1)
st_bbox(p1)
(p2 = st_point(c(1,2,3)))
class(p2)
(p3 = st_point(c(1,2,3), "XYM"))
pts = matrix(1:10, , 2)
(mp1 = st_multipoint(pts))
class(st_sfc(mp1)[[1]])
[1] "XY"         "MULTIPOINT" "sfg"   

Can't figure out why yet

mdsumner commented 6 years ago

This now works by work-around

sc_coord(st_cast(st_cast(minimal_mesh, "MULTILINESTRING"), "MULTIPOINT"))
mdsumner commented 6 years ago

Workaround applied:

Leave this open so I remove the bodge.

Now it's just a problem in anglr, which has no edge links to colour by.

plot3d(SC(mpts))
 Error in rgl.material(...) : There must be at least one color 

## interestingly
plot3d(PATH(mpts))
 Error in rgl.material(...) : There must be at least one color 
plot3d(TRI(mpts))
 Error: Column `subobject` not found 
## not an error
plot3d(ARC(mpts))
## DEL should work, same logic as sfdct for ungrouped entities
plot3d(DEL(mpts))
Error:  Input must have at least three input vertices.
 Error in RTriangle::triangulate(ps, ...) : Triangle exit, code $i 
mpadge commented 6 years ago

This is still an issue, because

> pts <- st_multipoint (matrix(1:10, , 2))
> mp <- st_sfc (pts, pts) %>% st_sf (data = 7:8)
> SC (mp)$edge
# A tibble: 5 x 3
  .vertex0   .vertex1   edge_     
  <chr>      <chr>      <chr>
1 b927d19f08 b927d19f08 e9c6935b87
2 26fbaa77ba 26fbaa77ba e9ec2d6648
3 cda16e6f11 cda16e6f11 8164f7459c
4 4d490ec088 4d490ec088 2ac8c5c832
5 2497ed977d 2497ed977d 725661c6eb

Yet the $edge table should just be empty here. This seems to be another iteration of the fundamental design concerns (cate vs. core blah blah). The above also gives

> SC (mp)$object_link_edge
# A tibble: 10 x 2
   edge_      object_   
   <chr>      <chr>
 1 bf3bd7a86f f596374596
 2 230b227e8e f596374596
 3 d128708e4d f596374596
 ...

Because

> names(SC(mp))
[1] "object"           "object_link_edge" "edge"             "vertex"           "meta"

silicate remains very-much a hard-wired edge model because we've both been thinking so much about the geometry, but that ridiculously precludes the ability to represent the most fundamental geometric entities of points. What to do? These are just some thoughts that spring to mind, and there are likely lots of other approaches ...

Options

Option 1 - Leave silicate as a strict edge-based model

But that is surely likely to restrict usage and ensure that sf will be preferred in many applications, and so I would argue should be strongly not preferred.

Option 2 - Replace object_link_edge with object_link_geom

The table in the above example would then link straight to the vertices rather than the edges. This could either be done through just renaming the table, so that the task of determining whether an object was a vertex or edge would require matching the geom ID to the full list of both vertex and edge IDs. Alternatively, an extra column could be introduced to that table, so the above would become

> SC (mp)$object_link_geom
# A tibble: 10 x 3
   geom_      object_    is_edge_
   <chr>      <chr>      <lgl>
 1 a44ff8da55 2e7826ac0b FALSE
 2 757a255617 2e7826ac0b FALSE
 3 785a7b59bb 2e7826ac0b FALSE
  ...

Option 3 - Leave largely as is, but with empty tables for points

> SC (mp)
$object                           
# A tibble: 2 x 2
   data object_   
* <int> <chr>    
1     7 2e7826ac0b      
2     8 16e9a1a601 

$object_link_edge       
# A tibble: 0 x 2      
# ... with 2 variables: edge_ <chr>, object_ <chr>

$edge
# A tibble: 0 x 3
# ... with 3 variables: .vertex0 <chr>, .vertex1 <chr>, .edge_ <chr>

$vertex
# A tibble: 5 x 3
     x_    y_ vertex_   
  <int> <int> <chr>
1     1     6 402d4ac356
2     2     7 df6baab8dc
3     3     8 d7961bae04
4     4     9 c46a9e8fd6
5     5    10 038c0b9390

$meta
# A tibble: 1 x 2
  proj  ctime              
  <chr> <chr>
1 NA    2018-10-29 08:58:02

attr(,"join_ramp")
[1] "object"           "object_link_edge" "edge"             "vertex"          
attr(,"class")
[1] "SC" "sc"

or maybe (?) with

attr(,"join_ramp")
[1] "object"           "vertex"          

The problem with that is there is then no way of relating the vertices back to their objects, so that wouldn't seem to me to be a workable solution.

I would argue for Option 2 as the only workable one here, with my preference for the second form of embellishing the object_link_geom table with the extra is_edge column. (It may of course be preferable to name this something other than object_link_geom to avoid potential confusion with the very well-established sf notion of a geom. object_link_entity? object_link_primitive? object_link_core? object_link_base? object_link_table?

mpadge commented 6 years ago

The issue from osmdata referenced immediately above indicates an alternative solution:

Option 3 - Just change the object table for point objects

> SC (mp)$object                                                                                                                                                                                                                                                                         
# A tibble: 20 x 3
   object_    object_type  data
   <chr>      <chr>    <dbl>
 1 bc870c2f2f vertex         7
 2 e0d25a10ea vertex         7
 3 cc3e394823 vertex         7
 4 9655a6e41c vertex         7
 5 0fe121f517 vertex         7
 6 e9fa951b0b vertex         7
 7 5ca0244609 vertex         7
 8 37f3628e7f vertex         7
 9 6f3bc5ebbf vertex         7
10 ad27ad78af vertex         7
11 ee1394c35f vertex         8
12 870a0eab31 vertex         8
13 a6f4f78d99 vertex         8
14 40f1466870 vertex         8
15 d17eee6d52 vertex         8
16 675a74efd7 vertex         8
17 6d8d6f8965 vertex         8
18 e998c892af vertex         8
19 f5d3378793 vertex         8
20 57383a5492 vertex         8

$object_link_edge       
# A tibble: 0 x 2      
# ... with 2 variables: edge_ <chr>, object_ <chr>

$edge
# A tibble: 0 x 3
# ... with 3 variables: .vertex0 <chr>, .vertex1 <chr>, .edge_ <chr>

$vertex
# A tibble: 5 x 3
     x_    y_ vertex_   
  <int> <int> <chr>
1     1     6 bc870c2f2f
2     2     7 e0d25a10ea
3     3     8 cc3e394823
4     4     9 9655a6e41c
5     5    10 0fe121f517

This kind of scheme would accommodate all kinds of SF point objects, including XYZ, XYM, XYZM, whatever, because everything other than the XY goes straight into the $object table. This would then just require pre-identifying POINT-type inputs and processing using distinct routines to generate SC structures like the above.

mdsumner commented 6 years ago

I actually think the right format for points is just like BINARY, with a (possibly nested) link of vertex indexes. It's right because the vertices are de-duplicated and it starts to look like a proper primitives model where the dimension of the indexes is the topological dimension (.vx0 = point, .vx0,.vx1 = line, .vx0, .vx1, .vx2 = triangle,...) . I just don't know what to call the functions, or the models.

I.e.

UNARY <- function(x, ...) {
  coord <- sc_coord(x)
  udata <- unjoin::unjoin(coord, x_, y_, key_col = "vertex_")
  o <- sc_object(x)
  gm <- gibble::gibble(x)
  o$vertex_ <- unname(split(udata[["data"]], 
                     gm$object))

  meta <- tibble::tibble(proj = get_projection(x), ctime = Sys.time()) 
  structure(list(object = o, vertex = udata$vertex_, meta = meta), 
            class = c("UNARY", "sc"))
}
UNARY(mp)

Can you proceed by choosing a reasonable rep, among your favoured options above, and maybe sometimes winging it? I'll happily patch around whatever we need to make it work.

I also might be causing you grief today as I've removed the "object_link_edge" and its counterparts in favour of just an 'edge' table, with the object ID (and no edge ID). But it seems to make a lot more sense so I hope we can align to it.

(Objects can't store Z, M, time, etc. because it doesn't work for multipoint - with trip objects, I put the other data on the path_linkvertex table, because there it's compleltely natural, they are instances of the coordinates defined by x, y_ - otherwise in anglr sometimes vertices will exist in XYZ, because a continuous surface has unique x-y-z, but in others (floating polygons) they are distinct in that 3-space).

mdsumner commented 6 years ago

Here's an expansion with a plot method.

pts <- st_multipoint(matrix(1:10, , 2))
mp <- st_sfc (pts, pts + c(2, 0)) %>% st_sf (data = 7:8)
x <- UNARY(mp)
plot.UNARY <- function(x, ...) {
  plot(x$vertex[c("x_", "y_")], type = "n")
  idx <- tidyr::unnest(x$object[c("object_", "vertex_")])
  col <- colourvalues::colour_values(idx$object_[idx$vertex_])
  points(x$vertex[idx$vertex_, c("x_", "y_")], col = col)
}
plot(x)

(I'm also aware I'm starting to mix structural indexes and relational IDs, so please bear with).

I like the idea of having all tables, sometimes empty - but I'm just not sure about the right mix - notice how silicore (segments, and paths, unduplicated coordinates) led to BINARY (segments, deduplicated coordinates).

mdsumner commented 6 years ago

What if

The 0 suffix means it's the structural form, with only two tables, and no massive IDs. SC works the same for lines and polygons currently, but has three tables vertex, topology, object.

In SC0 "topology_" is a nested column on object, with indexes into vertex - for points there's one column (.vx0), for lines or polygons there's two (.vx0, .vx1). In SC "topology" is a table, with proper IDs (.vx0 for points), (.vx0, .vx1 for lines, polygons).

So the number of columns in either nested or full table is the topology of the primitives (1 point), (2 segment).

It must drive you crazy how I keep refactoring all this, but I'm pretty sure this will work.

mdsumner commented 6 years ago

It means you can write lists with vertex, topology, and object table - and don't worry for now about classes and downstream stuff.

Is that workable?

mpadge commented 6 years ago

I really like having SC and SC0 - that makes a lot of intuitive sense. The rest of it I'll be actively working with as I keep going with osmdata_sc() ...

mdsumner commented 6 years ago

This looks like a goer, just cleaning out - BINARY will go, this new model for SC0 and SC will apply. SC0 for points, and maybe SC can fall back to that because it doesn't have to do any edge stuff.

mdsumner commented 6 years ago

Points are now supported by SC0 in topology branch. I consider this issue fixed. (though edges cases with PATH still need attention, it could suggest using SC0 - but I don't think we have the use-cases worked out yet).

mdsumner commented 6 years ago

SC now errors on point input. though PATH propagates the issue:

SC(PATH(st_cast(inlandwaters, "MULTIPOINT")))
Error: Column `subobject` must be a 1d atomic vector or a list

New issue for that.