libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
643 stars 285 forks source link

Add parallel xda/xdr mesh support #1037

Open roystgnr opened 8 years ago

roystgnr commented 8 years ago

As discussed in this thread: https://sourceforge.net/p/libmesh/mailman/libmesh-users/thread/alpine.LRH.2.20.1601191555380.19120@spark.ices.utexas.edu/

It's not safe for users to write parallel xdr/xda solution files except when they can be certain that the solution files will be read on a mesh with the same partitioning. Trying to do the opposite (which is our current default behavior) results in bugs: e.g. in our slit mesh test without WRITE_SERIAL_FILES enabled or in https://github.com/grinsfem/grins/pull/427

The proper way to fix this is to allow users to write parallel xdr/xda files which thereby match the parallel solution.

friedmud commented 8 years ago

Well... this is basically already implemented. I did it a few years ago (for the same reasons as stated in that thread) as CheckPointIO: https://github.com/libMesh/libmesh/blob/master/include/mesh/checkpoint_io.h

We've been using it for many years as the sole way of restarting in MOOSE with great success.

Like you say: it requires that restarts are done on the same number of processors as the original run... but for anything more than a trivial code that's almost always the way it has to be anyway (because SO many things are living in parallel).