Closed nschloe closed 3 years ago
Hey Nico, thanks for this critical and helpful review. I'll analyze these comments shortly. Hope you're doing well.
"Results indicate that Python array manipulations dominate parallel execution time and inhibit scalability"
I think it could be improved by improving the load balancing and domain decomposition method.
Besides that: the primary slow down is the re-formation of the halo ghost zones each meshing iteration, which requires array manipulation hence my comment.. If I could periodically lock the halo zones so that new halo zones do not need to be formed each iteration, it would dramatically accelerate the program. Unfortunately when you lock the halo zones, then the domain boundaries become imprinted in the mesh quality and more slivers form in 3d.
At one point along the development of this algorithm, I had played with the idea of first meshing with a domain decomposition that slices one axis and then after a set of iterations shifting to the other axis. However, I abandoned those ideas because they were too heuristic even for me.
What's "high-geometric"?
I mean high geometric quality meaning the angles inside the simplex aren't too big or too small between the edges/faces and they're fairly regular in the sense that there's not much variation.
Perhaps you mean "high geometric quality", though I'm not sure what "geometric" wants to say. Is there a mesh quality other than geometric? (Genuine question.)
I mean what you say. Your point is well taken; namely that the usage of the word "geometric" here is somewhat unnecessary in this context. I'm not aware of a mesh quality metric that doesn't depend on the geometry of the elements.
The wording unsolved problem is too big. The Riemann hypothesis is an unsolved problem. The absence of a function that translates a data file to a density function is a missing tool. I'd rephrase this as (suggestion) "mesh generation for geo domains is not user-friendly".
I agree that the usage of the word "unsolved" problem is a bit too dramatic. I liked your comment about the Riemann hypothesis. 👍
How so? I thought in geo domains, you use cubes all the time.
The majority of the time it's a cube. However, some times seismologists only model up until for the sea floor boundary or they're doing some inversion study over land. In these cases, the boundary is irregular and often represented by a velocity iso-contour/surface. For the current work I'm doing with shape optimization with Dr. Antoine Laurain at IME here, we immerse an irregular domain inside a larger field (this is a relatively new feature in the SM as well) and the irregular immersed domain is defined by a SDF.
"In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."
Totally agree with this. I'll see how I can adapt it.
Finite Element Method ---> finite element method, likewise Full Waveform Inversion etc.
This is capitalized on purpose because it's used to define an acronym. I'm not sure if this is valid or not for this journal. I'm guessing it's not based on your comment.
I'd like to know if this is intrinsic to the problem or if it can be fixed or if it is unclear.
I alluded to how I could improve this in the future in the last paragraph of the paper.
"In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."
I also incorporated some more sentences in the performance comparison section.
This is capitalized on purpose because it's used to define an acronym. I'm not sure if this is valid or not for this journal. I'm guessing it's not based on your comment.
Not sure if JOSS has a standard here; I don't think so. I think the typical way is not to capitalize it (see, e.g., https://en.wikipedia.org/wiki/Finite_element_method). This is a minor though, so feel free to do it one way or another.
high quality meshes
I had a peek at the changes already. Please change this to "high-quality meshes" everywhere (https://english.stackexchange.com/a/487384/23644). I always read it as "high (quality meshes)" and for a split second ask myself what a high mesh is.
I also incorporated some more sentences in the performance comparison section.
Nice.
However, some times seismologists only model up until for the sea floor boundary
Interesting. Leave it like it is then.
I had a peek at the changes already. Please change this to "high-quality meshes" everywhere (https://english.stackexchange.com/a/487384/23644). I always read it as "high (quality meshes)" and for a split second ask myself what a high mesh is.
Okay sure this is edited now.
I improved the parallel performance in #171 by line profiling and then removing some bottlenecks. I've updated the figure to reflect these improvements and a line in the text.
The majority of the time per meshing iteration in parallel is spent in the MPI communication routines.
Since the subdomains are axis-aligned rectangles, the communication interfaces are not minimized. With that said, I'm fine with 60% speed-up efficiency and waiting 15 minutes to build a 33M cell mesh.
I'm just reading through the draft once more. Let's finish this up tomorrow. More remarks for now:
CGAL is not competitive for the 3D benchmark and does not support user-defined mesh sizing functions and is therefore not shown
That's not true (the second part). See, e.g., https://github.com/nschloe/pygalmesh#local-refinement.
The DistMesh algorithm used in SeismicMesh requires far less calls to the sizing function
Less -> fewer.
For example in the EAGE benchmark, Gmsh calls the sizing function 95,756 times whereas DistMesh calls it just 26 times
You need explain that this is because Gmsh calls the sizing function for each point individually whereas SM does it for all points at once (which also explains why it's so much faster). Perhaps this can be merged with the sentence before that, e.g.:
It’s interesting to note that interpolant-based mesh sizing functions significantly slow the mesh generation time of Gmsh increasing its mesh generation time by a factor of ∼ 3. That is because Gmsh calls the sizing function at each point individually, e.g., 95,756 times for a particular run. For the same mesh, SM calls the sizing function 26 times, but for all points at once. It makes use of vectorization which is dramatically faster than Gmsh's approach.
A hint for the next article: In the plots, you use red RGB(255, 0, 0), and blue RGB(0, 0, 255). Less "drastic" colors tend to look better, see, e.g., https://matplotlib.org/3.1.1/users/dflt_style_changes.html#colors-color-cycles-and-color-maps.
Are you online tomorrow (Friday)? I can allocate some time so we can push this through.
Hey Nico sure thing. I'll be online. Let's finish this up tomorrow.
In the linked open PR there, I made some changes to the comments you made. I linked up the two sentences about probing the mesh size function and explained a little better why the difference in performance.
Can you merge? I'll have another look at the text.
Another thing I just found: The PDF is about 3 MB large. I only noticed because my laptop freaks out when trying to zoom in. The main offender are probably the performance graphs. What's the file format here, size etc?
Ok merged it in. The files live here. They’re pngs https://github.com/krober10nd/SeismicMesh/tree/master/paper
is 3 mb a problem for a pdf a problem?
is 3 mb a problem for a pdf a problem?
Seems unnecessary. How do you create the plots?
optipng (never hurts to apply it) brings the file down to 2.5, but that's not really what I'm looking for. Graphs and such are much better be represented as SVGs, in terms of file size and quality. Does your generation process have that option?
https://github.com/krober10nd/SeismicMesh/blob/master/benchmarks/run_disk.py
In the benchmark folder, I run these scripts. I take each image and use keynote to form a grid. Then I group the subplots until it fits into the very peculiar margins of the JOSS paper. I export to PDF, then to png, the crop white space with convert.
I have the original PDF. I guess that could be altered to a svg?
In the benchmark folder, I run these scripts.
Perfect. Can you, instead of show
, do
plt.savefig("out.svg", transparent=True, bbox_inches="tight")
and see what it gives? (Also apply svgo.)
I export to PDF, then to png
No need to convert to PNG. You can embed PDF into PDF/LaTeX.
Perfect. Can you, instead of show, do
cool. Okay let me try.
ah now I recall why I did plt.show()
. I needed to maximize the image before saving as the font size needed to be increased.
Let's google to see if we can avoid the show().
Edit: Tried https://stackoverflow.com/q/12439588/353337 yet?
It appears non-trivial. I appear to have to call plt.show()
but when I do this it creates a figure that I cannot close so the script can advance to the plt.savefig
.
If I do not call plt.show
, it doesn't expand the svg when it is saved.
mng = plt.get_current_fig_manager()
mng.full_screen_toggle()
plt.savefig("out.svg", transparent=True, bbox_inches="tight")
Next idea: Only increase the font size so much, rather increase the svg size after exporting.
okay
Number -> #
and using plt.tight_layout()
Seems like a lop of whitespace still.
You mean to the right and the bottom? This can be edited out with any SVG tool. I always use inkscape, but keynotes might do it as well.
Hm, unfortunately svg appears to be "not supported" in Keynote on my Mac laptop. I'm not sure how I'd stack all the svg's together to for the mosaic.
You're just trying to stick some figures together, right? Keynotes may be the wrong tool for the task. Try Illustrator or inkscape.
Yep just need to put several svg together in a grid.
Yeah, you want want a vector graphics program for that if dealing with vector graphics.
Success?
Somewhat. I downloaded Inkscape, and can group the images together but there's still whitespace on the right side.
In inkscape, I followed this thread to remove the whitespace...https://graphicdesign.stackexchange.com/questions/119125/remove-the-blank-space-around-the-svg-image-using-inkscape
Yep, looks good. Resize page to drawing or selection is the right thing to do. You can also add the domains plots etc there.
yep. Okay, I'll re-run all the benchmarks. Might as well change the colors then. I guess I need to upgrade my matplotlib.
It uses the new colors for some years now. You can explicitly select them with color="C0"
, "C1"
etc.
If you want to improve the figures more:
(Check https://github.com/nschloe/dufte for inspiration.)
When you say "remove the box" which box are you referring to?
The axes strokes. As a general principle for plot aesthetics: The fewer lines, the less distracting, the better. (data-to-ink-ratio)
How do you like the looks of it?
I like it! Its cleaner. I wasn't too aware of this stuff.
I'd add
Edit: The data looks crammed to the left. Ever though about semilogx or loglog? It looks alright for the 3D plot, so I'm not 100% here.
Yea I understand the concern, and I thought about the semilogx but because they all aren't this disparate, I'd rather they all have the same axes.
How is it coming along? I'm around for another hour or so.
I’m just finishing up lunch. I have to run the EAGE Benchmark and put them all in one figure. I guess I should also make the change to the other image?
How would you create the svg for the little mesh images that go next to each panel?
Just use png for those. (You can embed PNG in SVG.)
Okay for the other figure,
1) can't use tight_layout for some reason 2) when it saves to svg, the image is blank
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
matplotlib.use("TkAgg")
# plt.rcParams.update({"font.size": 14})
cores = range(1, 12)
# these times_small were obtained from running parallel_benchmark_EAGE.py
# times_small = np.array(
# [
# 696.2808640003204,
# 549.6535811424255,
# 434.7941460609436,
# 335.6148066520691,
# 296.3888158798218,
# 249.76853609085083,
# 235.70240020751953,
# 218.4058017730713,
# 203.4451940059662,
# 190.6157865524292,
# 184.21815061569214,
# ]
# )
times_large = np.array(
[
5460.0,
3476.0,
2490.0,
1968.0,
1754.0,
1444.0,
1363.0,
1158.0,
1116.0,
973.0,
933.0,
]
)
times_small = np.array(
[
705.9412625,
420.1220601,
318.739275,
263.7762356,
240.2493663,
202.3630016,
193.3727961,
179.6491494,
169.3839617,
158.9826994,
150.0208704,
]
)
# times_large2 = np.array(
# [
# 5511.5960257053375,
# 4862.753707170486,
# 3441.461901664734,
# 2783.6892313957214,
# 2303.14115691185,
# 1957.9908683300018,
# 1765.8561742305756,
# 1574.8861265182495,
# 1481.8865163326263,
# 1343.383672952652,
# 1302.5508060455322,
# ]
# )
# times_large3 = np.array(
# [
# 6143.5984008312225,
# 4234.233935117722,
# 3464.88405251503,
# 2575.567533969879,
# 2283.713561296463,
# 1954.7945246696472,
# 1839.4455134868622,
# 1593.873577594757,
# 1506.0424902439117,
# 1388.5100028514862,
# 1367.5728952884674,
# ]
# )
fig, ax = plt.subplots(1, 2)
fig = plt.gcf()
fig.suptitle("Intel Xeon Gold 6148 @2.4 GHz", fontsize=14)
ax[0].plot(
cores, times_small[0] / times_small, "C7-o", label="EAGE Light (~3.7M cells)"
)
for i, time in enumerate(times_small):
ptime = "{:.1f}".format(time / 60)
ax[0].annotate(
str(ptime),
(cores[i] - 0.25, (times_small[0] / times_small[i]) - 0.50),
color="C7",
)
ax[0].plot(
cores,
times_large[0] / times_large,
"C3-o",
label="EAGE Heavy (~33M cells)",
)
for i, time in enumerate(times_large):
ptime = "{:.1f}".format(time / 60)
ax[0].annotate(
str(ptime),
(cores[i] - 0.25, (times_large[0] / times_large[i]) + 0.50),
color="C3",
)
ax[0].plot(cores, np.arange(1, 12), "k--", label="Ideal speedup = c")
ax[0].set_xticks(cores)
ax[0].text(4, 500, "number of cells = " + str(3.7) + "M")
ax[0].legend()
ax[0].set_ylabel("Speedup $T_1/T_{c}$")
ax[0].set_xlabel("Number of cores (c)")
ax[0].set_title("Parallel speedup")
ax[0].yaxis.grid()
ax[0].set_frame_on(False)
size1 = 3.7e6
size2 = 33e6
rate1 = size1 / times_small
rate2 = size2 / times_large
ax[1].plot(cores, rate1, "C7-o", label="EAGE Light (~3.7M cells)")
ax[1].plot(
cores,
rate2,
"C3-o",
label="EAGE Heavy (~33M cells)",
)
ax[1].set_xticks(cores)
ax[1].set_title("Mesh generation rate")
ax[1].set_ylabel("Mesh generation rate (vertices/sec)")
ax[1].set_xlabel("Number of cores (c)")
ax[1].yaxis.grid()
ax[1].set_frame_on(False)
plt.tight_layout(pad=5.0)
plt.show()
fig.savefig("out5.svg", transparent=True, bbox_inches="tight", pad_inches=0)
It appears to work without ax[0].text(4, 500, "number of cells = " + str(3.7) + "M")
.
What's "high-geometric"? Perhaps you mean "high geometric quality", though I'm not sure what "geometric" wants to say. Is there a mesh quality other than geometric? (Genuine question.)
triangles/tetrahedral elements --> triangular/tetrahedral elements
Finite Element Method ---> finite element method, likewise Full Waveform Inversion etc.
"This in part contributes to the reality that automatic mesh generation for geophysical domains largely still remains an unsolved problem." The wording unsolved problem is too big. The Riemann hypothesis is an unsolved problem. The absence of a function that translates a data file to a density function is a missing tool. I'd rephrase this as (suggestion) "mesh generation for geo domains is not user-friendly".
"However, the usage of a particular sizing function"
Not the usage of a particular sizing function is an issue, but any sizing function that benefits from vectorization.
Let's just call this Python API or Python application programming interface.
How so? I thought in geo domains, you use cubes all the time.
"the circumcircle radius divided by the incircle radius"
It's the other way around.
I would like to have something like this: (not perfectly written) "In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."
"Results indicate that Python array manipulations dominate parallel execution time and inhibit scalability"
I'd like to know if this is intrinsic to the problem or if it can be fixed or if it is unclear.