manaakiwhenua / rhealpixdggs-py

rHEALPixDGGS-py implements code to both define an rHEALPix DGGS Reference System and to perform topological queries on its Identfiers. Our roadmap is for v1.0 to be fully compliant with OGC Topic 21 v2.0 / ISO 19170-1:2020.
GNU Lesser General Public License v3.0
18 stars 4 forks source link

Cell.boundary method slowness #7

Open aaime opened 4 years ago

aaime commented 4 years ago

I'm trying to use the "boundary" method of Cell to get a better outline for dart and skew_quad cells, while displaying a map with cells, cell parents, and parent parents.

I figured I'd need 10 points per side in order for the parents and the "grandsons" to align, which seems to be the case. However, calling boundary is a lot slower than calling vertices, to the point that map display it not interactive any longer.

For reference, this map, produced using vertices(False), takes 2 seconds, which is slow but still acceptable for an output of this size (click to see full size): image

This one instead, produced using boundary(10, False), requires 14 seconds, which is too much even for a large output: image

I've reduced it to a pure python script and tested the results with "time", to make sure it was not the GeoServer Java code calling onto the interpreter, slowing down things.

Vertices test

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
cell = dggs.cell(('S', 2, 2, 4))
for i in range(1, 10000):
  list(cell.vertices(False))
 time python3 /tmp/test_vertices.py 

real    0m14,944s
user    0m15,375s
sys 0m1,694s

Boundary test

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
cell = dggs.cell(('S', 2, 2, 4))
for i in range(1, 10000):
  list(cell.boundary(10, False))

And output:

>time python3 /tmp/test_boundary.py 

real    1m14,551s
user    1m14,906s
sys 0m1,751s

Boundary does more work, so it's normal that it would be slower... but the overhead seem to be too big.

Side note, it would probably interesting to compromise and call the boundary method only on skew_quad and dart cells, but I don't see a method to determine the type of the cell. Is there any obvious test that can be performed to lean about the nature of the cell? I'm guessing that maybe it's just a matter of checking if the top-most parent is either N or S? (edit, this approach seems to work, reduces the time of the "accurate" map down to 6 seconds... which is still too much).

rggibb commented 4 years ago

I would stay with the vertices. There will be a simple algorithm to select which vertices from resoution n are on the boundary of resolution (n-1) or (n-2) etc. I'll give some thought to this & post another reply :).

There will also be a simple algorithm to determine whether the resulting vertices can be pruned because they form a straight line - essentially iso-latitudinal and iso-longitudinal boundaries are straight in lat, long space, as you note in your side-note, that means only the dart and skew_quad cells need to be processed.

Cell shape can be determined with the function _cell.ellipsoidalshape(self) returns the shape of the cell as one of (‘quad’, ‘cap’, ‘dart’, or ‘skew_quad’) when viewed on the ellipsoid. If you want to do some bulk sorting of which cells need to be called, then in a (lat,long) coordinate space:

rggibb commented 4 years ago

@aaime OK, here is some pseudo-code. I havent developed the full code, because I'm not sure whether you will code it up in python or java.

As noted above, quad and cap cells, dont need further detail, they can just be drawn using their own vertices.

Noting the vertices for a cell C are returned with C.vertices().

To refine the boundaries of skew_quad cells using the vertices of its chjildren & noting that northern and southern bounadries don't need refining - the vertices we need from the chiildren are shown in the sketch as o:

C:
o-----+-----+-----o
|     |     |     |
|     |     |     |
|     |     |     |
o-----+-----+-----o
|     |     |     |
|  C3 |     | C5  |
|     |     |     |
o-----+-----+-----o
|     |     |     |
|     |     |     |
|     |     |     |
o-----+-----+-----o

In pseudo-code going counterclockwise from the nw or top-left vertex, these are:

c-vert-list = C.vertices(False);`
print (c-vert-list[0,1], 
       C.subcells()[5].vertices(False)[1,2], 
       c-vert-list[2,3], 
       Csubcells()[3].vertices(False)[3,0], 
       c-vert-list[0];
      )

This doesn't work directly in python because indexing for lists, arrays & dictionaries aren't the same thing, and also you would want all the elements to be at the same level of the list in the result & not nested as shown above, but my notation emphasizes that with suitable list/array element management, only three calls to vertices are required, one for the 4x vertices of C, one for the eastern vertices of C5 & one for the western vertices of C3.

To get the next l;evel of detail from the grandchildren, we need the vertices marked x below in addition to those marked o which we already have:

C:
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C03    |       |    C25|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C33    |       |    C55|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C63    |       |    C85|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o

In pseudo-code going counterclockwise from the nw or top-left vertex, these are :

c-vert-list = C.vertices(False);`
c5-vert-list = C.subcells()[5].vertices(False);`
c3-vert-list = C.subcells()[3].vertices(False);`
print (c-vert-list[0,1], 
       C.subcells()[2].subcells()[5].vertices()[1,2],
       c5-vert-list[1], 
       C.subcells()[5].subcells()[5].vertices()[1,2],
       c5-vert-list[2], 
       C.subcells()[8].subcells()[5].vertices()[1,2],
       c-vert-list[2,3], 
       C.subcells()[6].subcells()[3].vertices()[3,0],
       c3-vert-list[3], 
       C.subcells()[3].subcells()[3].vertices()[3,0],
       c3-vert-list[0], 
       C.subcells()[0].subcells()[3].vertices()[3,0],
       c-vert-list[0];
      )

So an additional six calls to vertices() for grandchildren.

For dart cells, we only need two sides to be refined, but those two sides will be neighbours instead of being on opposite sides. There isn't a function that can be called to quickly determine which two boundaries need to be processed, and judging by the code to determine whether neighbors are 'east' or 'west' it wont be trivial. So instead, noting that the boundaries that don't need processing are iso-latitudinal, a quick check of the latidudes for the vertices can be used. The following pseudo-code presumes that vertex[0] is a latitude.

c-vert-list = C.vertices(False);
c-refined-list[]
for v in [0, 1, 2, 3]:
 case 0: 
  c-refined-list[].append(c-vert-list[0])
  if c-vert-list[0][0] <> c-vert-list[1][0]
    c-refined-list[].append(C.subcells()[1].vertices(False)[0,1])

  case 1:
  c-refined-list[].append(c-vert-list[1])
  if c-vert-list[1][0] <> c-vert-list[2][0]
    c-refined-list[].append(C.subcells()[5].vertices(False)[1,2])

  case 2:
  c-refined-list[].append(c-vert-list[2])
  if c-vert-list[2][0] <> c-vert-list[3][0]
    c-refined-list[].append(C.subcells()[7].vertices(False)[2,3])

  case 3:
  c-refined-list[].append(c-vert-list[3])
  if c-vert-list(3][0] <> c-vert-list[0][0]
    c-refined-list[].append(C.subcells()[3].vertices(False)[3,0])

  c-refined-list[].append(c-vert-list[0])

Obviosly, this can be extended to grandchildren by iterating down a resolution level in a similar fashion to the code for skew_quad cells.

aaime commented 4 years ago

Tried it out, without good success. Came out with this bit of python code that I was then invoking from Java, grabbing the value of "result" in the end. Just added a few prints around to inspect intermediate results:

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
id = ('N', 3)
c = dggs.cell(id)

vertices = list(c.vertices(False))
print(vertices)
c5_vertices = list(dggs.cell(id + (5,)).vertices(False))
print(c5_vertices)
c3_vertices = list(dggs.cell(id + (3,)).vertices(False))
print(c3_vertices)
c25_vertices = list(dggs.cell(id + (2, 5)).vertices(False))
c55_vertices = list(dggs.cell(id + (5, 5)).vertices(False))
c85_vertices = list(dggs.cell(id + (8, 5)).vertices(False))
c63_vertices = list(dggs.cell(id + (6, 3)).vertices(False))
c33_vertices = list(dggs.cell(id + (3, 5)).vertices(False))
result = (vertices[0], vertices[1],
       c25_vertices[1], c25_vertices[2],
       c5_vertices[1], 
       c55_vertices[1], c55_vertices[2],
       c5_vertices[2], 
       c85_vertices[1], c85_vertices[2],
       vertices[2], vertices[3], 
       c63_vertices[3], c63_vertices[0],
       c3_vertices[3], 
       c33_vertices[3], c33_vertices[0],
       c3_vertices[3], c3_vertices[0], 
       vertices[0]
      )
del c5_vertices
del c3_vertices
del c25_vertices
del c55_vertices
del c63_vertices
del c33_vertices
print(result)

The result is not a continuous boundary, but something jumping around. Trying to figure out why, I had a better look at how the vertices and children are enumerated. Vertices seem to always be in "top/left, top/right, bottom/right, bottom/left" order, no matter what cell they are called upon. However, check in this picture the children of N1, N3 and N7:

image

Children do not enumerate in a consistent "cartesian" order over the polar caps, N10 is bottom right, N30 is bottom left, and N70 is top left. They do enumerate instead in a consistent order in the "middle belt", O, P, Q, R.

In the end, it probably does not matter much for the goal at hand. I've timed the results anyways (the CPU effort is related to grabbing the coordinates, not their particular order), and the timings are the same as calling boundary(10, False), so even getting the correct order would not end up improving performance, which was my original goal.

rggibb commented 4 years ago

You are quite right the cell's arent ordered in cartesian order, they are ordered on dggs face/zone order, and the top level faces are the sides of a cube.. If you unfold it onto a flat plane it looks like this:

17-1

The Nth and Sth square's letters are shown rotated in the vacant space, Then the zone numbering within each face follows a Z- space-filing curve:

17-2

and if we take it one more iteration and view it on a sphere looking at it from the NOP corner of the cube, it looks like this:

07-0

rggibb commented 4 years ago

Ok, in terms of the there being no advantage in compute timing for boundary with 10 points per edge, & getting the vertices. Possible strategies include:

@aaime ..thoughts?