more article review - Githubissues

nschloe commented 3 years ago

What's "high-geometric"? Perhaps you mean "high geometric quality", though I'm not sure what "geometric" wants to say. Is there a mesh quality other than geometric? (Genuine question.)
triangles/tetrahedral elements --> triangular/tetrahedral elements
Finite Element Method ---> finite element method, likewise Full Waveform Inversion etc.
- "This in part contributes to the reality that automatic mesh generation for geophysical domains largely still remains an unsolved problem." The wording unsolved problem is too big. The Riemann hypothesis is an unsolved problem. The absence of a function that translates a data file to a density function is a missing tool. I'd rephrase this as (suggestion) "mesh generation for geo domains is not user-friendly".
- "However, the usage of a particular sizing function"
Not the usage of a particular sizing function is an issue, but any sizing function that benefits from vectorization.
- "a Python scripting-based application programming interface."
Let's just call this Python API or Python application programming interface.
- "The mesh’s domain geometry is defined as the 0-level set of a signed distance function (SDF), which avoids the need to have explicit geometry information defining the boundary and can be particularly useful in geophysical domains."
How so? I thought in geo domains, you use cubes all the time.
"the circumcircle radius divided by the incircle radius"

It's the other way around.
- Comparison section:
I would like to have something like this: (not perfectly written) "In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."
"Results indicate that Python array manipulations dominate parallel execution time and inhibit scalability"

I'd like to know if this is intrinsic to the problem or if it can be fixed or if it is unclear.

krober10nd commented 3 years ago

Hey Nico, thanks for this critical and helpful review. I'll analyze these comments shortly. Hope you're doing well.

"Results indicate that Python array manipulations dominate parallel execution time and inhibit scalability"

I think it could be improved by improving the load balancing and domain decomposition method.

Besides that: the primary slow down is the re-formation of the halo ghost zones each meshing iteration, which requires array manipulation hence my comment.. If I could periodically lock the halo zones so that new halo zones do not need to be formed each iteration, it would dramatically accelerate the program. Unfortunately when you lock the halo zones, then the domain boundaries become imprinted in the mesh quality and more slivers form in 3d.

At one point along the development of this algorithm, I had played with the idea of first meshing with a domain decomposition that slices one axis and then after a set of iterations shifting to the other axis. However, I abandoned those ideas because they were too heuristic even for me.

krober10nd commented 3 years ago

What's "high-geometric"?

I mean high geometric quality meaning the angles inside the simplex aren't too big or too small between the edges/faces and they're fairly regular in the sense that there's not much variation.

Perhaps you mean "high geometric quality", though I'm not sure what "geometric" wants to say. Is there a mesh quality other than geometric? (Genuine question.)

I mean what you say. Your point is well taken; namely that the usage of the word "geometric" here is somewhat unnecessary in this context. I'm not aware of a mesh quality metric that doesn't depend on the geometry of the elements.

The wording unsolved problem is too big. The Riemann hypothesis is an unsolved problem. The absence of a function that translates a data file to a density function is a missing tool. I'd rephrase this as (suggestion) "mesh generation for geo domains is not user-friendly".

I agree that the usage of the word "unsolved" problem is a bit too dramatic. I liked your comment about the Riemann hypothesis. 👍

How so? I thought in geo domains, you use cubes all the time.

The majority of the time it's a cube. However, some times seismologists only model up until for the sea floor boundary or they're doing some inversion study over land. In these cases, the boundary is irregular and often represented by a velocity iso-contour/surface. For the current work I'm doing with shape optimization with Dr. Antoine Laurain at IME here, we immerse an irregular domain inside a larger field (this is a relatively new feature in the SM as well) and the irregular immersed domain is defined by a SDF.

"In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."

Totally agree with this. I'll see how I can adapt it.

krober10nd commented 3 years ago

Finite Element Method ---> finite element method, likewise Full Waveform Inversion etc.

This is capitalized on purpose because it's used to define an acronym. I'm not sure if this is valid or not for this journal. I'm guessing it's not based on your comment.

I'd like to know if this is intrinsic to the problem or if it can be fixed or if it is unclear.

I alluded to how I could improve this in the future in the last paragraph of the paper.

"In mesh generation, there is always a trade-off between generation speed and mesh quality. Since every generator draws the line elsewhere, or might even have user parameters to increase quality at the expense of time, it can be difficult to compare the results. That being said, with default settings Gmsh will produce high-quality meshes by far the fastest, SeismicMesh will produce meshes with the best quality, but much slower. Gmsh becomes comparatively slow when a user-defined mesh-density function is involved. This is exactly SeismicMesh's use case."

I also incorporated some more sentences in the performance comparison section.

nschloe commented 3 years ago

This is capitalized on purpose because it's used to define an acronym. I'm not sure if this is valid or not for this journal. I'm guessing it's not based on your comment.

Not sure if JOSS has a standard here; I don't think so. I think the typical way is not to capitalize it (see, e.g., https://en.wikipedia.org/wiki/Finite_element_method). This is a minor though, so feel free to do it one way or another.

high quality meshes

I had a peek at the changes already. Please change this to "high-quality meshes" everywhere (https://english.stackexchange.com/a/487384/23644). I always read it as "high (quality meshes)" and for a split second ask myself what a high mesh is.

I also incorporated some more sentences in the performance comparison section.

Nice.

However, some times seismologists only model up until for the sea floor boundary

Interesting. Leave it like it is then.

krober10nd commented 3 years ago

I had a peek at the changes already. Please change this to "high-quality meshes" everywhere (https://english.stackexchange.com/a/487384/23644). I always read it as "high (quality meshes)" and for a split second ask myself what a high mesh is.

Okay sure this is edited now.

krober10nd commented 3 years ago

I improved the parallel performance in #171 by line profiling and then removing some bottlenecks. I've updated the figure to reflect these improvements and a line in the text.

The majority of the time per meshing iteration in parallel is spent in the MPI communication routines.

Since the subdomains are axis-aligned rectangles, the communication interfaces are not minimized. With that said, I'm fine with 60% speed-up efficiency and waiting 15 minutes to build a 33M cell mesh.

nschloe commented 3 years ago

I'm just reading through the draft once more. Let's finish this up tomorrow. More remarks for now:

CGAL is not competitive for the 3D benchmark and does not support user-defined mesh sizing functions and is therefore not shown

That's not true (the second part). See, e.g., https://github.com/nschloe/pygalmesh#local-refinement.

The DistMesh algorithm used in SeismicMesh requires far less calls to the sizing function

Less -> fewer.

For example in the EAGE benchmark, Gmsh calls the sizing function 95,756 times whereas DistMesh calls it just 26 times

You need explain that this is because Gmsh calls the sizing function for each point individually whereas SM does it for all points at once (which also explains why it's so much faster). Perhaps this can be merged with the sentence before that, e.g.:

It’s interesting to note that interpolant-based mesh sizing functions significantly slow the mesh generation time of Gmsh increasing its mesh generation time by a factor of ∼ 3. That is because Gmsh calls the sizing function at each point individually, e.g., 95,756 times for a particular run. For the same mesh, SM calls the sizing function 26 times, but for all points at once. It makes use of vectorization which is dramatically faster than Gmsh's approach.

A hint for the next article: In the plots, you use red RGB(255, 0, 0), and blue RGB(0, 0, 255). Less "drastic" colors tend to look better, see, e.g., https://matplotlib.org/3.1.1/users/dflt_style_changes.html#colors-color-cycles-and-color-maps.

Are you online tomorrow (Friday)? I can allocate some time so we can push this through.

krober10nd commented 3 years ago

Hey Nico sure thing. I'll be online. Let's finish this up tomorrow.

krober10nd commented 3 years ago

In the linked open PR there, I made some changes to the comments you made. I linked up the two sentences about probing the mesh size function and explained a little better why the difference in performance.

nschloe commented 3 years ago

Can you merge? I'll have another look at the text.

nschloe commented 3 years ago

Another thing I just found: The PDF is about 3 MB large. I only noticed because my laptop freaks out when trying to zoom in. The main offender are probably the performance graphs. What's the file format here, size etc?

krober10nd commented 3 years ago

Ok merged it in. The files live here. They’re pngs https://github.com/krober10nd/SeismicMesh/tree/master/paper

is 3 mb a problem for a pdf a problem?

nschloe commented 3 years ago

is 3 mb a problem for a pdf a problem?

Seems unnecessary. How do you create the plots?

nschloe commented 3 years ago

optipng (never hurts to apply it) brings the file down to 2.5, but that's not really what I'm looking for. Graphs and such are much better be represented as SVGs, in terms of file size and quality. Does your generation process have that option?

krober10nd commented 3 years ago

https://github.com/krober10nd/SeismicMesh/blob/master/benchmarks/run_disk.py

In the benchmark folder, I run these scripts. I take each image and use keynote to form a grid. Then I group the subplots until it fits into the very peculiar margins of the JOSS paper. I export to PDF, then to png, the crop white space with convert.

krober10nd commented 3 years ago

I have the original PDF. I guess that could be altered to a svg?

nschloe commented 3 years ago

In the benchmark folder, I run these scripts.

Perfect. Can you, instead of show, do

plt.savefig("out.svg", transparent=True, bbox_inches="tight")

and see what it gives? (Also apply svgo.)

nschloe commented 3 years ago

I export to PDF, then to png

No need to convert to PNG. You can embed PDF into PDF/LaTeX.

krober10nd commented 3 years ago

Perfect. Can you, instead of show, do

cool. Okay let me try.

krober10nd commented 3 years ago

ah now I recall why I did plt.show()

. I needed to maximize the image before saving as the font size needed to be increased.

nschloe commented 3 years ago

Let's google to see if we can avoid the show().

Edit: Tried https://stackoverflow.com/q/12439588/353337 yet?

krober10nd commented 3 years ago

It appears non-trivial. I appear to have to call plt.show() but when I do this it creates a figure that I cannot close so the script can advance to the plt.savefig.

If I do not call plt.show, it doesn't expand the svg when it is saved.

 mng = plt.get_current_fig_manager()
 mng.full_screen_toggle()
 plt.savefig("out.svg", transparent=True, bbox_inches="tight")

nschloe commented 3 years ago

Next idea: Only increase the font size so much, rather increase the svg size after exporting.

krober10nd commented 3 years ago

okay

Number -> # and using plt.tight_layout()

Seems like a lop of whitespace still.

nschloe commented 3 years ago

You mean to the right and the bottom? This can be edited out with any SVG tool. I always use inkscape, but keynotes might do it as well.

krober10nd commented 3 years ago

Hm, unfortunately svg appears to be "not supported" in Keynote on my Mac laptop. I'm not sure how I'd stack all the svg's together to for the mosaic.

nschloe commented 3 years ago

You're just trying to stick some figures together, right? Keynotes may be the wrong tool for the task. Try Illustrator or inkscape.

krober10nd commented 3 years ago

Yep just need to put several svg together in a grid.

nschloe commented 3 years ago

Yeah, you want want a vector graphics program for that if dealing with vector graphics.

nschloe commented 3 years ago

Success?

krober10nd commented 3 years ago

Somewhat. I downloaded Inkscape, and can group the images together but there's still whitespace on the right side.

krober10nd commented 3 years ago

In inkscape, I followed this thread to remove the whitespace...https://graphicdesign.stackexchange.com/questions/119125/remove-the-blank-space-around-the-svg-image-using-inkscape

nschloe commented 3 years ago

Yep, looks good.  Resize page to drawing or selection is the right thing to do. You can also add the domains plots etc there.

krober10nd commented 3 years ago

yep. Okay, I'll re-run all the benchmarks. Might as well change the colors then. I guess I need to upgrade my matplotlib.

nschloe commented 3 years ago

It uses the new colors for some years now. You can explicitly select them with color="C0", "C1" etc.

nschloe commented 3 years ago

If you want to improve the figures more:

make it semilogx or loglog
remove the ylabel (it's sideways and hard to read, and in the info is already in the title)
remove the box
remove the vertical grid lines

(Check https://github.com/nschloe/dufte for inspiration.)

krober10nd commented 3 years ago

When you say "remove the box" which box are you referring to?

nschloe commented 3 years ago

The axes strokes. As a general principle for plot aesthetics: The fewer lines, the less distracting, the better. (data-to-ink-ratio)

krober10nd commented 3 years ago

removed axes stroke,
removed vertical lines
removed y-labels
changed colors to mpl v2.0 standards
corrected cgal --> CGAL

nschloe commented 3 years ago

How do you like the looks of it?

krober10nd commented 3 years ago

I like it! Its cleaner. I wasn't too aware of this stuff.

nschloe commented 3 years ago

I'd add

gmsh -> Gmsh
add [s] or (s) to the left title to make sure we're talking about seconds
perhaps increase the spacing between the plots a bit

Edit: The data looks crammed to the left. Ever though about semilogx or loglog? It looks alright for the 3D plot, so I'm not 100% here.

krober10nd commented 3 years ago

Yea I understand the concern, and I thought about the semilogx but because they all aren't this disparate, I'd rather they all have the same axes.

nschloe commented 3 years ago

How is it coming along? I'm around for another hour or so.

krober10nd commented 3 years ago

I’m just finishing up lunch. I have to run the EAGE Benchmark and put them all in one figure. I guess I should also make the change to the other image?

krober10nd commented 3 years ago

How would you create the svg for the little mesh images that go next to each panel?

nschloe commented 3 years ago

Just use png for those. (You can embed PNG in SVG.)

krober10nd commented 3 years ago

Okay for the other figure,

1) can't use tight_layout for some reason 2) when it saves to svg, the image is blank

 import matplotlib
 import matplotlib.pyplot as plt
 import numpy as np

 matplotlib.use("TkAgg")
 # plt.rcParams.update({"font.size": 14})

 cores = range(1, 12)
 # these times_small were obtained from running parallel_benchmark_EAGE.py
 # times_small = np.array(
 #    [
 #        696.2808640003204,
 #        549.6535811424255,
 #        434.7941460609436,
 #        335.6148066520691,
 #        296.3888158798218,
 #        249.76853609085083,
 #        235.70240020751953,
 #        218.4058017730713,
 #        203.4451940059662,
 #        190.6157865524292,
 #        184.21815061569214,
 #    ]
 # )
 times_large = np.array(
     [
         5460.0,
         3476.0,
         2490.0,
         1968.0,
         1754.0,
         1444.0,
         1363.0,
         1158.0,
         1116.0,
         973.0,
         933.0,
     ]
 )

 times_small = np.array(
     [
         705.9412625,
         420.1220601,
         318.739275,
         263.7762356,
         240.2493663,
         202.3630016,
         193.3727961,
         179.6491494,
         169.3839617,
         158.9826994,
         150.0208704,
     ]
 )

 # times_large2 = np.array(
 #    [
 #        5511.5960257053375,
 #        4862.753707170486,
 #        3441.461901664734,
 #        2783.6892313957214,
 #        2303.14115691185,
 #        1957.9908683300018,
 #        1765.8561742305756,
 #        1574.8861265182495,
 #        1481.8865163326263,
 #        1343.383672952652,
 #        1302.5508060455322,
 #    ]
 # )

 # times_large3 = np.array(
 #    [
 #        6143.5984008312225,
 #        4234.233935117722,
 #        3464.88405251503,
 #        2575.567533969879,
 #        2283.713561296463,
 #        1954.7945246696472,
 #        1839.4455134868622,
 #        1593.873577594757,
 #        1506.0424902439117,
 #        1388.5100028514862,
 #        1367.5728952884674,
 #    ]
 # )

 fig, ax = plt.subplots(1, 2)
 fig = plt.gcf()
 fig.suptitle("Intel Xeon Gold 6148 @2.4 GHz", fontsize=14)
 ax[0].plot(
     cores, times_small[0] / times_small, "C7-o", label="EAGE Light (~3.7M cells)"
 )
 for i, time in enumerate(times_small):
     ptime = "{:.1f}".format(time / 60)
     ax[0].annotate(
         str(ptime),
         (cores[i] - 0.25, (times_small[0] / times_small[i]) - 0.50),
         color="C7",
     )

 ax[0].plot(
     cores,
     times_large[0] / times_large,
     "C3-o",
     label="EAGE Heavy (~33M cells)",
 )
 for i, time in enumerate(times_large):
     ptime = "{:.1f}".format(time / 60)
     ax[0].annotate(
         str(ptime),
         (cores[i] - 0.25, (times_large[0] / times_large[i]) + 0.50),
         color="C3",
     )

 ax[0].plot(cores, np.arange(1, 12), "k--", label="Ideal speedup = c")
 ax[0].set_xticks(cores)
 ax[0].text(4, 500, "number of cells = " + str(3.7) + "M")
 ax[0].legend()
 ax[0].set_ylabel("Speedup $T_1/T_{c}$")
 ax[0].set_xlabel("Number of cores (c)")
 ax[0].set_title("Parallel speedup")
 ax[0].yaxis.grid()
 ax[0].set_frame_on(False)

 size1 = 3.7e6
 size2 = 33e6

 rate1 = size1 / times_small
 rate2 = size2 / times_large

 ax[1].plot(cores, rate1, "C7-o", label="EAGE Light (~3.7M cells)")
 ax[1].plot(
     cores,
     rate2,
     "C3-o",
     label="EAGE Heavy (~33M cells)",
 )
 ax[1].set_xticks(cores)
 ax[1].set_title("Mesh generation rate")
 ax[1].set_ylabel("Mesh generation rate (vertices/sec)")
 ax[1].set_xlabel("Number of cores (c)")
 ax[1].yaxis.grid()
 ax[1].set_frame_on(False)

 plt.tight_layout(pad=5.0)
 plt.show()
 fig.savefig("out5.svg", transparent=True, bbox_inches="tight", pad_inches=0)

nschloe commented 3 years ago

It appears to work without ax[0].text(4, 500, "number of cells = " + str(3.7) + "M").

krober10nd commented 3 years ago

https://github.com/krober10nd/SeismicMesh/blob/svg_figures/paper/Performance.svg

krober10nd / SeismicMesh

more article review #168