krober10nd / simple_cgal

Perform serial and distributed memory MPI 2D/3D Delaunay triangulation with CGAL and Qhull.
MIT License
6 stars 2 forks source link

dela. triangulation in parallel support #2

Open krober10nd opened 4 years ago

krober10nd commented 4 years ago

...using the algorithm in this paper https://mrzv.org/publications/distributed-delaunay/

What it requires on top of the simple dela. wrapper is to: 1) decompose the domain into cuboids (maybe cykdtree?) 2) determine if a point is part of a infinite or finite Voronoi cell (cgal wrapper) 3) communicate points to neighboring subdomains (mpi4py)

krober10nd commented 4 years ago

We also, need

1) calculate circumspheres (cgal) 2) calculate if a sphere intersects with a domain box.

krober10nd commented 4 years ago

Hey @glyg

By any chance, are you aware of any good python packages to decompose a set of 2 or 3D points into rectangles/cuboids with each block featuring similar numbers of points?

glyg commented 4 years ago

I thought about it ... Easy way would be to do a regular grid (with np.meshgrid) and apply binary masks. Alternatively one can use kmeans algorithms.


grid_size = 4
points = np.random.random((1000, 2))

xlims = points[:, 0].min(), points[:, 0].max()
ylims = points[:, 1].min(), points[:, 1].max()
xx, yy = np.meshgrid(np.linspace(*xlims, grid_size, endpoint=False),
                     np.linspace(*ylims, grid_size, endpoint=False))

dx = (xlims[1] - xlims[0]) / grid_size
dy = (ylims[1] - ylims[0]) / grid_size

## Pure numpy
blocks = []
for low_x, low_y in zip(xx.ravel(), yy.ravel()):
    block = points[
        (points[:, 0] > low_x) & (points[:, 0] <= low_x+dx)
        & (points[:, 1] > low_y) & (points[:, 1] <= low_y+dy)
    ]
    if block.shape[0]:
        blocks.append(block)

## KMeans
from scipy.cluster.vq import kmeans2

centers = np.vstack([xx.ravel()+dx/2, yy.ravel()+dy/2]).T
centroids, labels = kmeans2(points, k=centers)

fig, ax = plt.subplots()
ax.plot(*points.T, 'k+')
ax.plot(*centers.T, "ko", ms=40, alpha=0.3)
blocks_km = [points[labels==label, :]
             for label in np.unique(labels)]

for block in blocks:
    ax.plot(*block.T, 'd', alpha=0.5, ms=10  )

for block in blocks_km:
    ax.plot(*block.T, 'o', alpha=0.5, ms=10  )

fig.set_size_inches(12, 12)

This gives regular boxes, not regular sample size though.

For regular sample size, I think we can get it through sorting and indexing tricks... Wait a minute

glyg commented 4 years ago

And here is the solution with grouping in bins. It satisfies the better the constrain that each block should be the same size:

x_sorted = np.argsort(points[:, 0])
y_sorted = np.argsort(points[:, 1])

num_points = points.shape[0]
step = num_points//grid_size
ixx, iyy = np.meshgrid(
    np.arange(num_points, step=step),
    np.arange(num_points, step=step)
)
blocks_set = []
for idx, idy in zip(ixx.ravel(), iyy.ravel()):
    common = set(x_sorted[idx: idx+step]).intersection(
        y_sorted[idy: idy+step]
    )
    blocks_set.append(points.take(list(common), axis=0))
krober10nd commented 4 years ago

excellent! subdivision in x and y works just fine and it can easily be extended to z..

I'll update it into a subroutine. Given a desired number of processors, the routine returns: 1) the axis aligned bounding boxes 2) an index the length of points indicating which block each processor belongs in. 3) each group's block neighbors.

krober10nd commented 4 years ago

Some things I'd like to improve and would be interested to talk about...Right now it can only block the domain into the user-defined nblocks^2. I guess blocks could be merged together to produce other even number block configurations?

Also neighbor connectivity could be figured out through nearest neighbor queries? I'm not sure there's a programatic way to figure that out knowing ahead of time the structure of meshgrid output? For example block 0 is neighbors with 3, 4 and 1 if there's 9 blocks.

 import numpy as np

 def blocker(points, nblocks):
     ''' Decompose points into even # of cuboids each roughly balanced in terms of points'''
     num_points,dim = points.shape

     assert nblocks >= 2, 'too few nblocks, must be >= 4'
     assert dim > 2 or dim < 3, 'dimensions of points are wrong'
     assert num_points//nblocks > 1, 'too few points for nblocks'

     x_sorted = np.argsort(points[:, 0])
     y_sorted = np.argsort(points[:, 1])

     step = num_points//nblocks
     ixx, iyy = np.meshgrid(
         np.arange(num_points, step=step),
             np.arange(num_points, step=step)
     )
     blocks = []
     ownership = np.zeros((num_points,1))
     block_num = 0
     for idx, idy in zip(ixx.ravel(), iyy.ravel()):
         common = set(x_sorted[idx: idx+step]).intersection(
             y_sorted[idy: idy+step]
         )
         block_num += 1
         owners = list(common)
         for owner in owners:
             ownership[owner] = block_num
         minx = points.take(owners, axis=0)[:,0].min()
         maxx = points.take(owners, axis=0)[:,0].max()
         miny = points.take(owners, axis=0)[:,1].min()
         maxy = points.take(owners, axis=0)[:,1].max()

         blocks.append((minx,miny,maxx,maxy))
    return blocks, ownership
krober10nd commented 4 years ago

Hey, I came up with a working solution that allows you to choose any number of boxes 2^x. I think this will be fine for parallel Delaunay triangulation. I recursively bisect the domain. I used some of the code you wrote above as well.


 import numpy as np
 import math

 def blocker(points, nblocks):
     ''' Decompose point coordinates into # of blocks (powers of 2) roughly balanced in # of points'''
     num_points,dim = points.shape

     assert nblocks%2 == 0 , 'number of blocks must be even'
     assert nblocks >= 2, 'too few nblocks, must be >= 2'
     assert dim > 2 or dim < 3, 'dimensions of points are wrong'
     assert num_points//nblocks > 1, 'too few points for nblocks'

     num_bisects = int(math.log2(nblocks))

     block_sets = __bisect(points)
     if num_bisects == 0 : return block_sets
     num_bisects -= 1
     for _ in range(num_bisects):
         store = []
         for block in block_sets:
             tmp = __bisect(np.asarray(block))
             store.append(tmp[0])
             store.append(tmp[1])
         block_sets = store
     return block_sets

 def __bisect(points):
     ''' Bisect point coordinates into two half spaces'''
     num_points,dim = points.shape

     x_sorted = np.argsort(points[:, 0])
     y_sorted = np.argsort(points[:, 1])

     step = num_points//2
     ixx, iyy = np.meshgrid(
         np.arange(num_points, step=step),
             np.arange(num_points, step=step)
     )
     blocks_set = []
     for idx, idy in zip(ixx.ravel(), iyy.ravel()):
         common = set(x_sorted[idx: idx+step]).intersection(
             y_sorted[idy: idy+step]
         )
         blocks_set.append(points.take(list(common), axis=0))

     temp = []
     for i in range(0,len(blocks_set),2):
        temp.append(np.concatenate((blocks_set[i],blocks_set[i+1]),axis=0))
     blocks_set = temp
     return blocks_set
krober10nd commented 4 years ago

Hey @glyg , making some good progress on the parallel Delaunay implementation detailed here https://mrzv.org/publications/distributed-delaunay/

I actually suspect I can simplify the algorithm from that paper since I decided to decompose the domain into rectangles parallel with the x-axis (note the rectangle spans the entire y direction in the case of 3D as well). In this decomposition of points using rectangles, the neighbors are either block#-1 or block#+1 so the communication overhead isn't as bad as compared to decomposing into cubes (worst case scenario with cubes is 27 neighbors). However, for this type of decomposition worst case is two neighbors (in both 2D and 3D).

Furthermore, with this point decomposition there's no need to do two communication passes for points that are infinite or finite (which requires and expensive convex hull computation).

The only cost that I can see is 1) computing the circumballs and 2) determining if a circumball intersects with a neighbor. The latter is a trivial calculation since we just need to test two blocks to see which one it the point should be migrated to. The former calculation (of circumballs) I did but not so performant and I suspect this is better suited for C++ and to link up with pybind11.

Lastly, the assumption of their algorithm is that the circumball of a triangle does not exceed the extent of a neighboring block. However, if the triangle quality is particularly low, then this could be a problem. It's interesting to note in the paper that they do not use random point distributions but randomly shift points from a structured grid. I suppose this is the reason for that.

In the picture below, I show the extents of a 2 processor decomposition in red. The points in blue are labeled with the processor they need to be sent to in the communication step (their circumballs intersect the neighboring block). Note that this point set comes from a numpy.random.random so maybe not the best for the future. In my application, we derive the initial point distribution for mesh generation from decimating a structured grid so the quality would be better.

Figure_1

krober10nd commented 4 years ago

With the latest commit on the parallel branch, I have measured a speed-up and produced identical results as a serial triangulation in 2D. On my local cluster, I run into problems with Mpi4Py+mvapich2 however that don't seem to be related to the algorithm.

krober10nd commented 4 years ago

hey @glyg I've integrated the 2D parallel mesh generation technique into my code that I use to generate meshes for seismology. I integrated it into the smoothing based DistMesh algorithm (which was the initial goal in the first place), which has to call repeatedly call Delaunay.

https://github.com/krober10nd/SeismicMesh/blob/parallel/SeismicMesh/generation/mesh_generator.py

specifically here

https://github.com/krober10nd/SeismicMesh/blob/9f5f5e5b45f7db1ddc8a6247c2a4f4743aac5860/SeismicMesh/generation/mesh_generator.py#L205

I'm now working on 3D but over there. At some point, we should probably get catch this repo up as there were several changes to the algorithm that improved things considerably.