Closed nschloe closed 4 years ago
Hey @nschloe.
Those two tasks can be neatly separated, and I find it unfortunate that those two things are tied together in one package.
Yes, I agree with you on that. It is worth separating them at some point. The mesh generator and sliver removal methods accept the same input arguments/same API as the standard DistMesh algorithm anyway.
What's currently missing from the README and the draft is a comparison with other commonly used mesh generators. I would count CGAL and Gmsh to those (perhaps there are more you can think of). I would like to see
- a comparison in mesh creation speed, and
- a comparison in cell quality
This is very reasonable and I agree it is important to show so I'll perform this experiment. I'm familiar with both pygmsh
and pygal
so I'll use those.
A good measure for cell quality is, for example, d * circumcircle_radius / incircle_radius (where d is 2 for triangles and 3 for tetrahedra).
I think what you describe is here.
I think what you describe is here.
It's not clear to me what this function computes. The docstrings says
radius-to-edge ratio
but the expression 2 * r / R
looks more like what I was talking about. The 2
hints that this only works for triangles.
Yea, that only works for triangles. I decided to use your meshplex
to do the quality calculation.
This is very reasonable and I agree it is important to show so I'll perform this experiment.
I'm extremely curious how the 3D approaches compare in cell quality.
I'm extremely curious how the 3D approaches compare in cell quality.
Yep. Me too! Well the cell-to-edge radius and also the dihedral angle histograms are the go to charts for this kind of thing so I'll produce both.
Yea, that only works for triangles. I decided to use your meshplex to do the quality calculation.
Well, then you can also copy-paste the corresponding part of mesh_tetra. https://github.com/nschloe/meshplex/blob/master/meshplex/mesh_tetra.py#L438 (When copy-pasting, a link to the source is appropriate, too.)
Okay, I started the comparison with building a uniform cuboid with a minimum dihedral bound of 10 degrees in serial here:
https://github.com/krober10nd/SeismicMesh/tree/perf/benchmarks
I'm not really too sure how to run cgal
in parallel (seems complex). I'll add more things as I go.
I think there's some tinkering abound. I noticed when I increased the dt
to 0.20 in the DistMesh algorithm, I consistently produced higher quality meshes in fewer iterations (20-30) and I didn't need a default of 50 iterations either.
Anyway, this is very interesting to me too.
I'm not really too sure how to run cgal in parallel (seems complex).
I agree, I wouldn't go down that route either. Just use them all in serial and add a statement about the parallel nature of SeismicMesh.
By the way, I started using line_profiler
and found some interesting results. Nearly 60% of the time is spent in _compute_forces
and it turned out 59.8% of that respective time was spent in a trivial routine to get the edges of the triangulation. I tracked it down further and basically discovered calculating the unique edges of the triangulation appears to be more expensive than the Delaunay triangulation (which is absurd). I think I'll have to investigate more.
Function: _compute_forces at line 490
Line # Hits Time Per Hit % Time Line Contents
==============================================================
490 @profile
491 def _compute_forces(p, t, fh, h0, L0mult):
492 """Compute the forces on each edge based on the sizing function"""
493 29 124.0 4.3 0.0 dim = p.shape[1]
494 29 34.0 1.2 0.0 N = p.shape[0]
495 29 3086984.0 106447.7 59.8 bars = _get_bars(t)
496 29 280824.0 9683.6 5.4 barvec = p[bars[:, 0]] - p[bars[:, 1]] # List of bar vectors
497 29 89467.0 3085.1 1.7 L = np.sqrt((barvec ** 2).sum(1)) # L = Bar lengths
498 29 2589.0 89.3 0.1 L[L == 0] = np.finfo(float).eps
499 29 883617.0 30469.6 17.1 hbars = fh(p[bars].sum(1) / 2)
500 29 207923.0 7169.8 4.0 L0 = hbars * L0mult * ((L ** dim).sum() / (hbars ** dim).sum()) ** (1.0 / dim)
501 29 3105.0 107.1 0.1 F = L0 - L
502 29 17715.0 610.9 0.3 F[F < 0] = 0 # Bar forces (scalars)
503 Fvec = (
504 29 101017.0 3483.3 2.0 F[:, None] / L[:, None].dot(np.ones((1, dim))) * barvec
505 ) # Bar forces (x,y components)
506
507 29 112.0 3.9 0.0 Ftot = mutils.dense(
508 29 13668.0 471.3 0.3 bars[:, [0] * dim + [1] * dim],
509 29 36650.0 1263.8 0.7 np.repeat([list(range(dim)) * 2], len(F), axis=0),
510 29 209170.0 7212.8 4.1 np.hstack((Fvec, -Fvec)),
511 29 226331.0 7804.5 4.4 shape=(N, dim),
512 )
513 29 78.0 2.7 0.0 return Ftot
Benchmarking against other software is often a great catalyst for improvement.
Indeed!
Got the cumulative time in unique_edges
down from 60% --> 40%, which results in 20% faster meshing iterations. https://github.com/krober10nd/SeismicMesh/blob/118899a60ae60fcf79072b412862585ac9ddd616/SeismicMesh/geometry/cpp/fast_geometry.cpp#L13
Still, it remains the biggest bottle neck each meshing iteration. perhaps more room for improvement.
So, I made some small improvements further today and added a second benchmarking example with a sphere that has a ring of finer resolution in the center. I haven't added gmsh
timing tests yet, but that will probably happen tomorrow after I update my gmsh
on my computer.
Anyway focusing on performance first: I've found that the biggest slow down in the DistMesh
algorithm is in fact sorting to get the unique edges of the triangulation, which now takes place in cpp.
Following up on this, I added a comparison between gmsh
and cgal
's 3d mesh generator to the 3dDistMesh
implementation I have building a quality cube and a sphere using analytical mesh sizing functions here. I plan to include a short section on the README linking to the benchmarking section and to up date the paper (removing the workflow diagram for the benchmarking results as the former information is better suited for the readthedocs website)
Generally, my DistMesh implementation produces average higher quality cells by about 5-6% than either cgal
or gmsh
but has a longer tail and is slower for the cube but faster than cgal
for the sphere. In each case, the minimum dihedral bounds are similar around 10 degrees. However, this was never an option with regard to the original implementation.
I hope this helps address the issue. Please let me know what you think.
I can't see it. Is it in the readme of your link?
Sorry, it should be here. https://github.com/krober10nd/SeismicMesh/tree/perf/benchmarks
Nice job in comparison and presentation. What else I'd like to see:
I think this kind of overview is still missing from GitHub. I'll compile something, perhaps tomorrow if I find the time. I'll make sure to ping you.
Agreed on all fronts.
Actually, I"m not too sure why there's horizontal lines appearing in my termplotlib
histograms...I tried reseting the shell preferences but they persist. Have you had any complaints from others about that? Probably best to start this discussion on the issues there if iI can't figure it out.
Okay, I've added four benchmarks per your comments in this thread. Two for the README examples in 2D and 3D, a unit circle, and a unit sphere. Head over to here to see them:
https://github.com/krober10nd/SeismicMesh/tree/perf/benchmarks
Now I need to figure out how to concisely present the four benchmarks (or maybe just the two seismic examples) on one figure to go inside the paper. I'm welcome to any suggestions on this aspect.
I've also added a snippet on the main README but I'll eventually add this figure that I"m alluding to above on there as a quick summary.
One idea for visualization:
In the article discussion of the plots, I would like to hear something about why gmsh is so fast for circle, but as soon as the mesh size function comes in, it's comparatively slow. (There are so many calls to the mesh size function. You could count them, too.) I personally don't understand why gmsh is competitive for EAGE (which is the same thing for 3D), but perhaps you have an idea here.
In regard to the sizing function queries, this is a really interesting point. I did question the ball results too as it's a similar thing to EAGE.
I suspect gmsh calls this sizing function many more times as new points are inserted which is unlike in DistMesh where the full point set is set up at initialization. Following this, DistMesh might take better advantage of vectorization in this regard when calling the sizing function. However this doesn't explain the ball results or maybe it does as the vectorization becomes more a game changer at scale.
An easy test would be to set the EAGE sizing function to a constant and see if this dramatically improves gmsh
's performance.
I'll first switch the plot type then investigate this in more detail.
- [x] What's angle for tetrahedra? Given the number range, it's not the dihedral angle.
The calculation I use is for the dihedral angles
thing is from meshplex
specifically
plex = meshplex.MeshTetra(mesh.points, mesh.cells[1][1])
angles = plex.q_min_sin_dihedral_angles
I see it's not degrees but rather a normalized quantity.
It would be nice to plot this too somehow along with the other two plots as this is just as critical (perhaps more important even) than the minimum quality as the Jaccobian can become horribly ill conditioned when doing FEM/FV calculation in 3D if you have slivers in the mesh.
- [x] All the ASCII graphs have a vertical "grid line" at 60 (degrees) which is only useful for triangular meshes.
Got rid of all ASCII graphs in lieu of the other plot style. The benchmarks still produce them but they won't end up in the paper and README and are just for the user.
- [x] One might think about replacing the ASCII graphs by proper (matplotlib?) hist graphs, if only for axes.
See comment above.
- [x] Sphere -> Ball? (If it's hollow, it's a sphere, otherwise a ball.)
Changed the name.
- [x] Circle -> Disk (same argument)
Changed the name.
- [x] What's angle for tetrahedra? Given the number range, it's not the dihedral angle.
Changed the plot label accordingly to q_min_sin_dihedral_angles
to reflect that they're the normalized minimum sin of each cell's 6 dihedral angles.
One idea for visualization:
Don't plot the histograms at all. For each domain, rather add two plots: One for creation speed. x-axis: number of points (increasing), y-axis: time. This way, it's easy to see which software is fastest for a given mesh size One for mesh quality: x-axis: number of points (same as above). y-axis: cell quality. It'd be useful to plot both average and minimum quality (the latter perhaps dashed).
See here (which should also appear on the BENCHMARK readme as well).
What still remains a mystery is to why the CGAL's disk has such a high number of vertices and the quality is so bad. Anyway, this is what you get.
Take two. This time I incorporated the minimum dihedral angles on there and annotated for clarity. I also thought the markers earlier were too large so I made them dots and lines.
Just need to correct some colors on the ball panel...
So, investigating the peformance more in gmsh
with the EAGE benchmark reveals what you suspected, the performance is directly related to the sizing function call frequency. For example, when I set the sizing function tor return a float of 300 m I get this result:
When I set the sizing function to call a gridded interpolant that returns simply 300 m given a query point (so the size of the mesh is identical), performance slows down by 3x to around 12 seconds.
DistMesh
, for this configuration there are 26 calls to the sizing function in total (1 for the initial point rejection and 25 for the remaining 25 meshing iterations). *The call to the sizing function in DistMesh
takes better advantage of vectorization than what happens in gmsh
. Partly a result because advancing/front Delaunay refinement algorithms iteratively add vertices to the mesh and you end up needing to query the sizing function each point insertion (or something like that).
*Also important to note here: if I'm not mistake I'm using the most efficient way to represent a non-analytical mesh sizing function, a gridded interpolant. Following that, things would become significantly worse for gmsh
if we relied on a scattered triangular interpolant.
I'm looking more into the ball..
So for the ball benchmark, the difference in performance between EAGE is the analytical nature of the sizing function. If I create a gridded interpolant with the sizing function values encoded in it, the performance slows down significantly (similar to what occurs with EAGE):
With analytical sizing function (super fast):
With gridded interpolant encoded with analytical sizing function (basically same speed as SeismicMesh):
So the answer is that for a simple analytical sizing function, gmsh
will beat out pretty much any mesh generator but if have an interpolant, then it's competitive with some other approaches like what I developed.
@krober10nd, I would like to say really good job on the performance comparison. I just have some notes on the figures, and some general thoughts from my experience with mesh generation.
Some more general thoughts:
Hey @jorgensd many thanks for providing these detailed comments. I'll think them over and provide some feedback soon
Firstly I just want to make sure you're looking at the latest performance figure (which is no longer in this thread). The latest is on the Readme and on the paper. Is this what you're referring to in terms of font sizes being too small?
Hey @jorgensd many thanks for providing these detailed comments. I'll think them over and provide some feedback soon
Firstly I just want to make sure you're looking at the latest performance figure (which is no longer in this thread). The latest is on the Readme and on the paper. Is this what you're referring to in terms of font sizes being too small?
Yes, I've looked at this figure as well, and I think the ticks, labels etc are to small. Also note that in the new version of the paper, you do not state that the dashed lines represents the minimum cell quality (you mention this in the readme). This should be added to the paper.
Good point @jorgensd . Will fix the font size shortly.
In many other research areas, such as thermodynamic analysis and fluid dynamics, a mesh of 4 million cells will be considered as quite a small mesh. In my experience, a large mesh should at least be of more than 50 million cells. Please do note that I am not suggesting that you redo your performance study. I am just curious of if you are confident your mesh generation algorithm scales to such large problems, and if such large domains are relevant in your area of research.
Yes. I had this consideration in the design of the program to support very large 3d tetrahedral meshes. I've built 40 million element meshes in 3d (with 12 cores) on our project cluster here (and ran the sliver bound). You can scale up the EAGE example by increasing the source frequency and decreasing the minimum element size accordingly and see for yourself.
As a final note, I have seen that the NetGen optimizer for gmsh can significantly improve the mesh quality (however, it will slow down the mesh generation quite alot). It would be interesting to see the improvement in mesh quality compared to the runtime for the Disk, Ball and EAGE mesh (as that is where gmsh is faster than SeismicMesh and produces higher quality meshes.
Yes, the mesh optimization part is actually rarely discussed in mesh generation papers. Cf https://arxiv.org/pdf/1805.08831.pdf (at the very end last paragraph where they mention that it does not consider mesh optimization). In general, SeismicMesh produces distributions that are skewed towards higher quality so perhaps optimization will operate faster (I can try that with NetGen) and see how long it takes.
Unfortunately there's a 1000 word limit on JOSS papers so I'm limited as to how many figures I can add.
I would not require you to add it to the joss paper. It would just be very nice to see the performance difference (and could probably be part of an application paper at some point)
I played around with the direct GMSH interface today (for the sphere benchmark, and have the following observation: Using the direct interface of gmsh is significantly faster for the current sphere problem:
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_gmsh_direct 3.2112 (1.0) 3.3103 (1.0) 3.2576 (1.0) 0.0391 (1.0) 3.2433 (1.0) 0.0575 (1.0) 2;0 0.3070 (1.0) 5 1
test_gmsh 4.4527 (1.39) 4.5829 (1.38) 4.5271 (1.39) 0.0521 (1.33) 4.5192 (1.39) 0.0776 (1.35) 2;0 0.2209 (0.72) 5 1
test_cgal 7.3994 (2.30) 8.0953 (2.45) 7.6487 (2.35) 0.2744 (7.02) 7.5822 (2.34) 0.3469 (6.04) 1;0 0.1307 (0.43) 5 1
test_seismic_mesh 12.2665 (3.82) 13.0569 (3.94) 12.5982 (3.87) 0.2910 (7.44) 12.5200 (3.86) 0.3090 (5.38) 2;0 0.0794 (0.26) 5 1
--------------------------------
@nschloe What do you think is a fair comparison for the sphere problem? (I guess the overhead of using gmsh to extract the geometry and topology in the python API is less efficient than writing msh).
I would also like to note that if one adds in the NetGen optimizer for gmsh, GMSH becomes slower than SeismicMesh, but significantly increases its mesh quality (Note that the relative times are from a separate run, and therefore the relative times are not directly comparable):
GMSH netgen 14.4653 (3.14) 15.9122 (3.11) 14.9896 (3.12) 0.5433 (8.38) 14.8682 (3.16) 0.4037 (5.35) 1;1 0.0667 (0.32) 5 1
The dihedral angle distribution for the GMSH NetGen and Seismic mesh is visualized below:
The gmsh direct is defined below:
def test_gmsh_direct(benchmark):
angles, quality, elapsed, num_vertices, num_cells = benchmark.pedantic(
run_gmsh_direct, iterations=1, rounds=5, warmup_rounds=0
)
assert numpy.amin(angles / numpy.pi * 180) > 10.0
def run_gmsh_direct(HMIN=0.025):
gmsh.initialize()
sphere = gmsh.model.occ.addSphere(0, 0, 0, 1)
gmsh.model.mesh.setSizeCallback(lambda dim, tag, x, y, z: (
abs(sqrt(x ** 2 + y ** 2 + z ** 2) - 0.5) / 5 + HMIN)
/ 1.1)
t1 = time.time()
gmsh.model.occ.synchronize()
gmsh.model.addPhysicalGroup(3, [sphere], 1)
gmsh.model.mesh.generate(3)
# gmsh.model.mesh.optimize("Netgen")
gmsh.write("gmsh_mesh.msh")
elapsed = time.time() - t1
gmsh.finalize()
mesh = meshio.read("gmsh_mesh.msh")
points = mesh.points
cells = mesh.cells[0].data
num_cells = len(cells)
num_vertices = len(points)
plex = meshplex.MeshTetra(points, cells)
angles = plex.q_min_sin_dihedral_angles
quality = plex.q_radius_ratio
meshio.write_points_cells(
"gmsh_sphere_netgen.vtk",
points,
[("tetra", cells)],
)
return angles, quality, elapsed, num_vertices, num_cells
Thanks Jorge. I would like to compare running NetGen for both meshes (not just gmsh
's mesh) as there is no direct mesh improvement methods inside SeismicMesh besides sliver removal and NetGen does quite a bit of things. I would also like to repeat for the EAGE example.
I tried to run the NetGen
optimizer from the Pythongmsh
API but this code produces an identical set of points and cells as what enters it and nothing happens. Also this business of having to write the mesh out and read it back in is just about as lazy as you could get. Do you have suggestions for any other way to do this directly in Python from meshio
formats (like from vtk)?
This is not a MWE.
meshio.write_points_cells(
"SeismicMesh_ball.msh",
points,
[("tetra", cells)],
)
gmsh.initialize()
gmsh.model.add("Mesh from file")
gmsh.merge("SeismicMesh_ball.msh")
gmsh.model.mesh.optimize("Netgen")
gmsh.write("SeismicMesh_ball_NetGen.msh")
gmsh.finalize()
mesh = meshio.read(
"SeismicMesh_ball_NetGen.msh", # string, os.PathLike, or a buffer/open file
)
points2, cells2 = mesh.points, mesh.cells[0].data
I do apologize, it seems like I neglected one of the mesh options used by pygmsh, which resulted in the speed-up (by generating less cells). When adding these, the runs are equivalent. This means that can ignore all the results with gmsh vs pygmsh above.
I also seem to be struggling to use netgen on msh files, I'll give it a go later today and see if can get something to work.
Yet another correction from my side, if you add the kwarg force=True
to the optimize command force it to work on discrete entities.
i.e.: gmsh.model.mesh.optimize('Netgen', True)
Ref:
https://gitlab.onelab.info/gmsh/gmsh/-/blob/master/api/gmsh.py#L1419
Yet another correction from my side, if you add the kwarg force=True to the optimize command force it to work on discrete entities. i.e.: gmsh.model.mesh.optimize('Netgen', True) Ref: https://gitlab.onelab.info/gmsh/gmsh/-/blob/master/api/gmsh.py#L1419
Thanks for looking into that.
Unfortunately I haven't had success. The example below gives a segmentation 11 when calling the optimize with force=True:
import gmsh
import SeismicMesh
import meshio
cube = SeismicMesh.Cube((0.0, 1.0, 0.0, 1.0, 0.0, 1.0))
points, cells = SeismicMesh.generate_mesh(domain=cube, edge_length=0.05)
points, cells = SeismicMesh.sliver_removal(points=points, domain=cube, edge_length=0.05)
meshio.write_points_cells(
"cube.msh",
points,
[("tetra", cells)],
file_format="gmsh",
)
# optimize with NetGen
gmsh.initialize()
gmsh.model.add("Mesh from file")
gmsh.merge("cube.msh")
gmsh.model.mesh.optimize("Netgen",force=True)
gmsh.write("cube_ng.msh")
gmsh.finalize()
# read it back in to convert to vtk
mesh = meshio.read("cube_ng.msh")
# write to vtk
meshio.write_points_cells(
"cube_ng.vtk",
points,
[("tetra", cells)],
file_format="vtk",
)
I'm not sure what the gmsh docstring means either
If `force' is set apply the optimization also to discrete entities.
Why wouldn't the optimization apply to the discrete entities? Isn't that the point of the optimization?
Strange, I got the optimization working when reloading an msh
-file created by the gmsh API. I'll see if I can spot any subtle differences between the msh files later today.
Yea, I'll try that. It would be nice to just pass a set of points and entity table in Python to the optimize as the file i/o may introduce unnecessary complexity and chances for errors.
Nevermind spoke too fast. Had much better success with the default optimizer just passing "" to the optimize method
gmsh.model.mesh.optimize("", force=True)
I think we can more or less close this one here.
Yea alright by me.
SeismicMesh consists of two parts:
Those two tasks can be neatly separated, and I find it unfortunate that those two things are tied together in one package.
Anyway, here we are.
What's currently missing from the README and the draft is a comparison with other commonly used mesh generators. I would count CGAL and Gmsh to those (perhaps there are more you can think of). I would like to see
with CGAL and Gmsh. A good measure for cell quality is, for example,
d * circumcircle_radius / incircle_radius
(whered
is 2 for triangles and3
for tetrahedra). The value is between 0 and 1, where 1 is a perfectly symmetrical simplex. CGAL has mesh size function input for 3D meshes only. Since you're using Python, it may be useful to use pygalmesh or to access the CGAL methods directly from C++. (You may be familiar enough with the API.) Gmsh supports mesh functions from its own Python interface in master (and 4.7.0, to be released). It may be useful to take a look at pygmsh for simplicity here.