Bottleneck in current cubed sphere mesh generation approach

amartinhuertas commented 3 years ago

The current approach that we are using for generating the cubed sphere mesh has a bottleneck that will prevent us from scaling beyond hundreds of cells per panel (indeed, my laptop runs out of memory for approx 200x200 cells/panel).

The issue is in this line:

https://github.com/gridap/Gridap.jl/blob/a52bc10503f77f700b7cee096c7311aebc131cee/src/Geometry/CartesianDiscreteModels.jl#L53

Recall that we generate a Cartesian volume mesh of the cube, which we then restrict to the cube surface. In this particular line, the Cartesian Grid is (temporarily) transformed into a UnstructuredGrid (coordinates vertices and connectivity explicitly stored, instead of managed implicitly), and this causes memory to increase cubically with the number of cells per direction (i.e., if we multiply the number of cells per direction by 2, the memory grows by a factor of 8). This is clearly not acceptable, it should increase quadratically, as we have a 2D mesh.

Possible approaches that comes into my mind (perhaps there are better solutions):

To bypass this issue is to write an ad-hoc extension of DiscreteModel{2,3} for the cubed sphere mesh made of 6 nxn CartesianDiscreteModel{2,2} panels joined together (as suggested by @davelee2804 in one of his comments) + the alpha/beta equiangular coordinates to R^3 coordinates that @cbladwell already tested previously.
Use p4est to handle the cubed sphere mesh. This would be required for large scale simulations.

davelee2804 commented 3 years ago

Hey @amartinhuertas ,

Looking to the short-medium term (ie: the current paper, any maybe a thermal shallow water equations paper), the test for this can be run on a mesh with 32X32 elements in each panel using RT2 (maybe also 48X48 elements using RT1), with a run time of 20 days and a 30 second time step for the explicit schemes and a 360 second time step for the implicit scheme.

It is still unclear to me if this resolution can be practically run using GridapPardiso with multi-threading and serial FEM assembly with the current mesh (maybe we might just get away with it :))

Your option 1 would probably get us over the line, if our current focus is to "do the paper"

However your option 2 is obviously the best practice approach. This could potentially form a paper in itself (the NUMA team have already published a paper on this topic, as you know).

My only question is if the p4est approach is over-engineered for our needs. Our mesh is semi-structured, and this could presumably be exploited in an ad-hoc extension to the Dc2Dp3 model perhaps more rapidly (I don't know, you are the expert here).

Aside from the mesh generation, I would say the big problem with the current mesh is data locality: Since the mesh is generated by traversing the x,y,z global coordinates, there are adjacent DOFs that are very far apart in memory, so the non-zero structure of our matrices is probably a but funny. I think either approach should account for this....

amartinhuertas commented 3 years ago

Aside from the mesh generation, I would say the big problem with the current mesh is data locality: Since the mesh is generated by traversing the x,y,z global coordinates, there are adjacent DOFs that are very far apart in memory, so the non-zero structure of our matrices is probably a but funny. I think either approach should account for this....

Ok, I think that the numbering issue of the cells with the current solution can be easily solved (I will do it). What is a resonable order from a data locality POV ? Panel-wise ordering?

davelee2804 commented 3 years ago

I think so yes. As we know, half the panels have a reverse orientation, so I guess we can stick with this. I would say for each panel you could probably keep one coordinate (x,y,z) fixed, and loop over the other two. That should get rid of the massive striding in the 4 panels that wrap around the equator (if you plot the cell indices in paraview you can see visually how these are strided now)

santiagobadia commented 3 years ago

I think that 1 is the way to go. More generally:

Define a symbolic mesh TS of dimension DM (only in terms of connectivities, no geometrical info, e.g., coordinates)
Define a map f: K \subset DM \to DA, where DA is the ambient space and K is the parametric space of each cell K of the symbolic mesh TS. It provides the geometrical info. The map can be described cell-wise, using panels, FE geometrical maps, etc. Both 1 and 2 are linked.

2'. Instead of the map, we can consider an implicit representation in the future, using a cell-wise metric g. But that is another story.

amartinhuertas commented 3 years ago

Yes, I agree with @santiagobadia ... this is definitely the way to go (also with parallelization in mind). Anyway, I consider this as a long-term goal, just to not forget I have registered it here.

For the experiments that you mention above, @davelee2804, I am positive that the current solution + one computational node in a supercomputer will be sufficient.

davelee2804 commented 3 years ago

OK, I trust you :)

amartinhuertas commented 3 years ago

We finally have a p4est-based solution for meshing the sphere. With this tool, this bottleneck is no longer a matter of concern. Clossing the issue.

gridapapps / GridapGeosciences.jl

Bottleneck in current cubed sphere mesh generation approach #13