Closed pbtoast closed 8 years ago
This occurs more rarely than I had thought, so maybe it's not super high priority at the moment.
One of these kinds of failures is a parallel Silo write, with debris of this sort:
56/81 Testing: test_create_uniform_mesh_4_proc
56/81 Test: test_create_uniform_mesh_4_proc
Command: "/usr/local/bin/mpirun" "-np" "4" "/Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-static-double-mpicc-Release/geometry/tests/test_create_uniform_mesh"
Directory: /Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-static-double-mpicc-Release/geometry/tests
"test_create_uniform_mesh_4_proc" start time: Feb 22 03:53 UTC
Output:
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
DBCreate: Low-level function call failed: link group
DBSetDir: File was closed or never opened/created.: link group
DBWrite: File was closed or never opened/created.: link group
DBPutMultimesh: File was closed or never opened/created.: link group
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
0: Fatal error: Error writing multi-mesh to Silo master file uniform_mesh_10x10x10_4f_4procs/uniform_mesh_10x10x10_4f-0.silo.
Here's another failure:
60/82 Testing: test_create_rectilinear_mesh_4_proc
60/82 Test: test_create_rectilinear_mesh_4_proc
Command: "/usr/local/bin/mpirun" "-np" "4" "/Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-shared-double-mpicc-Release/geometry/tests/test_create_rectilinear_mesh"
Directory: /Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-shared-double-mpicc-Release/geometry/tests
"test_create_rectilinear_mesh_4_proc" start time: Feb 22 03:57 UTC
Output:
[==========] Running 2 test(s).
[ RUN ] test_create_rectilinear_mesh
[==========] Running 2 test(s).
[ RUN ] test_create_rectilinear_mesh
[==========] Running 2 test(s).
[ RUN ] test_create_rectilinear_mesh
[==========] Running 2 test(s).
[ RUN ] test_create_rectilinear_mesh
[ OK ] test_create_rectilinear_mesh
[ RUN ] test_plot_rectilinear_mesh
[ OK ] test_create_rectilinear_mesh
[ RUN ] test_plot_rectilinear_mesh
[ OK ] test_create_rectilinear_mesh
[ RUN ] test_plot_rectilinear_mesh
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
0: Fatal error: silo_file_write_mesh: Could not write mesh 'mesh'.
DBCreate: File not found or invalid permissions: ./rectilinear_4x4x4-0.silo
DBSetDir: File was closed or never opened/created.: ./rectilinear_4x4x4-0.silo
DBWrite: File was closed or never opened/created.: ./rectilinear_4x4x4-0.silo
DBPutUcdmesh: File was closed or never opened/created.: ./rectilinear_4x4x4-0.silo
[ OK ] test_plot_rectilinear_mesh
[==========] 2 test(s) run.
[ PASSED ] 2 test(s).
[ PASSED ] 2 test(s).
Another:
57/82 Testing: test_create_uniform_mesh_4_proc
57/82 Test: test_create_uniform_mesh_4_proc
Command: "/usr/local/bin/mpirun" "-np" "4" "/Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-shared-double-mpicc-Release/geometry/tests/test_create_uniform_mesh"
Directory: /Users/travis/build/polymec/polymec-dev/build/Darwin-x86_64-mpi-shared-double-mpicc-Release/geometry/tests
"test_create_uniform_mesh_4_proc" start time: Feb 22 04:57 UTC
Output:
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[==========] Running 3 test(s).
[ RUN ] test_create_uniform_mesh
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_create_uniform_mesh
[ RUN ] test_plot_uniform_mesh_to_single_file
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
[ OK ] test_plot_uniform_mesh_to_single_file
[ RUN ] test_plot_uniform_mesh_to_n_files
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
3: Fatal error: Error writing multi-mesh to Silo master file uniform_mesh_10x10x10_4f_4procs/uniform_mesh_10x10x10_4f-0.silo.
DBCreate: Low-level function call failed: link group
DBSetDir: File was closed or never opened/created.: link group
DBWrite: File was closed or never opened/created.: link group
DBPutMultimesh: File was closed or never opened/created.: link group
The create_*mesh_n_proc (n > 1) unit tests occasionally hang, meaning that there are likely some deadlock issues in the mesh partitioning process and/or the writing of mesh files. This doesn't happen very much, but it's not the robust behavior we're striving for. It should be fixed.