Open skvarjun opened 1 month ago
@cticenhour do you think this issue might be related to the discussion with lower dimension projection on MOOSE? https://github.com/idaholab/moose/discussions/28599
Without taking a closer look, that's a decent guess.
I was trying to run Malamute in HPC cluster with 24 cores. The program runs smoothly in series with the command;
malamute-opt -i dcs5_5_mm_constant_properties.i >& log.out
For MPI, the execution line is as follows;
mpirun -n 24 malamute-opt -i dcs5_5_mm_constant_properties.i >& log.out
The log file throws an error as shown below;
M M A L A M M U U T T T T E E E E MM MM A A L A A MM MM U U T E
M M M M A A L A A M M M M U U T E E E E M M M A A A A L A A A A M M M U U T E
M M A A L L L L A A M M U U U T E E E E
MALAMUTE: MOOSE Application Library for Advanced Manufacturing UTilitiEs
Copyright 2021 - 2024, Battelle Energy Alliance, LLC ALL RIGHTS RESERVED
NOTICE: These data were produced by BATTELLE ENERGY ALLIANCE, LLC under Contract No. DE-AC07-05ID14517 with the Department of Energy. For ten(10) years from July 8, 2021, the Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this data to reproduce, prepare derivative works, and perform publicly and display publicly, by or on behalf of the Government. There is provision for the possible extension of the term of this license. Subsequent to that period or any extension granted, the Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this data to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so. The specific term of the license can be identified by inquiry made to Contractor or DOE. NEITHER THE UNITED STATES NOR THE UNITED STATES DEPARTMENT OF ENERGY, NOR ANY OF THEIR EMPLOYEES, MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY DATA, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
In UnstructuredMesh::stitch_meshes: This mesh has 63 nodes on boundary
bottom_ram_spacer_top' (2). Other mesh has 21 nodes on boundary
' (4). Minimum edge length on both surfaces is 0.001. In UnstructuredMesh::stitch_meshes: Found 21 matching nodes.In UnstructuredMesh::stitch_meshes: This mesh has 81 nodes on boundary
bottom_sinter_spacer_top' (14). Other mesh has 55 nodes on boundary
' (16). Minimum edge length on both surfaces is 0.0005. In UnstructuredMesh::stitch_meshes: Found 55 matching nodes.In UnstructuredMesh::stitch_meshes: This mesh has 63 nodes on boundary
top_ram_spacer_bottom' (44). Other mesh has 21 nodes on boundary
' (50). Minimum edge length on both surfaces is 0.001. In UnstructuredMesh::stitch_meshes: Found 21 matching nodes.In UnstructuredMesh::stitch_meshes: This mesh has 81 nodes on boundary
top_sinter_spacer_bottom' (32). Other mesh has 55 nodes on boundary
' (38). Minimum edge length on both surfaces is 0.0005. In UnstructuredMesh::stitch_meshes: Found 55 matching nodes.Framework Information: MOOSE Version: git commit 4f939da0bd on 2024-09-10 LibMesh Version:
PETSc Version: 3.21.4 SLEPc Version: 3.21.1 Current Time: Fri Sep 13 22:37:17 2024 Executable Timestamp: Wed Sep 11 13:34:12 2024
Checkpoint: Wall Time Interval: Every 3600 s User Checkpoint: Disabled Checkpoints Kept: 2 Execute On: TIMESTEP_END
Parallelism: Num Processors: 24 Num Threads: 1
Mesh: Parallel Type: replicated Mesh Dimension: 2 Spatial Dimension: 2 Nodes:
Total: 21788 Local: 949 Min/Max/Avg: 846/989/907 Elems:
Total: 5744 Local: 242 Min/Max/Avg: 218/266/239 Num Subdomains: 34 Num Partitions: 24 Partitioner: metis
Nonlinear System: Num DOFs: 44636 Num Local DOFs: 1952 Variables: { "temperature" "potential" } { "temperature_bottom_ram_cc_lm" "potential_bottom_ram_cc_lm" } { "temperature_bottom_cc_sinter_lm" "potential_bottom_cc_sinter_lm" } { "temperature_bottom_sinter_punch_lm" "potential_bottom_sinter_punch_lm" } { "temperature_bottom_punch_powder_lm" "potential_bottom_punch_powder_lm" } { "temperature_powder_top_punch_lm" "potential_powder_top_punch_lm" } { "temperature_top_punch_sinter_lm" "potential_top_punch_sinter_lm" } { "temperature_top_sinter_cc_lm" "potential_top_sinter_cc_lm" } { "temperature_top_cc_ram_lm" "potential_top_cc_ram_lm" } { "temperature_inside_low_punch_lm" "potential_inside_low_punch_lm" } { "temperature_inside_powder_lm" "potential_inside_powder_lm" } { "temperature_inside_top_punch_lm" "potential_inside_top_punch_lm" } "temperature_gap_top_sinter_die_lm" "temperature_gap_bottom_sinter_die_lm"
Finite Element Types: "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE" "LAGRANGE"
Approximation Orders: "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND" "SECOND"
Auxiliary System: Num DOFs: 53289 Num Local DOFs: 2277 Variables: "heat_transfer_radiation" { "electric_field_x" "electric_field_y" } "interface_normal_lm"
Finite Element Types: "LAGRANGE" "MONOMIAL" "LAGRANGE" Approximation Orders: "SECOND" "FIRST" "FIRST"
Execution Information: Executioner: Transient TimeStepper: ConstantDT TimeIntegrator: ImplicitEuler Solver Mode: NEWTON PETSc Preconditioner: lu mat_superlu_dist_fact: SamePattern mat_superlu_dist_replacetinypivot: true MOOSE Preconditioner: SMP
LEGACY MODES ENABLED: This application uses the legacy initial residual evaluation behavior. The legacy behavior performs an often times redundant residual evaluation before the solution modifying objects are executed prior to the initial (0th nonlinear iteration) residual evaluation. The new behavior skips that redundant residual evaluation unless the parameter Executioner/use_pre_smo_residual is set to true. To remove this message and enable the new behavior, set the parameter 'use_legacy_initial_residual_evaluation_behavior' to false in *App.C. Some tests that rely on the side effects of the legacy behavior may fail/diff and should be re-golded.
Time Step 0, time = 0
Postprocessor Values: +----------------+----------------+----------------+ | time | applied_current| pyrometer_point| +----------------+----------------+----------------+ | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | +----------------+----------------+----------------+
Time Step 1, time = 6, dt = 6 Pre-SMO residual: -nan Nonlinear solve did not converge due to DIVERGED_FNORM_NAN iterations 0 Solve Did NOT Converge! Aborting as solve did not converge
Time Step 1, time = 3, dt = 3 Pre-SMO residual: -nan Nonlinear solve did not converge due to DIVERGED_FNORM_NAN iterations 0 Solve Did NOT Converge! Aborting as solve did not converge . . . Time Step 1, time = 1.36424e-12, dt = 1.36424e-12 Pre-SMO residual: -nan Nonlinear solve did not converge due to DIVERGED_FNORM_NAN iterations 0 Solve Did NOT Converge! Aborting as solve did not converge
Time Step 1, time = 1e-12, dt = 1e-12 Pre-SMO residual: -nan Nonlinear solve did not converge due to DIVERGED_FNORM_NAN iterations 0 Solve Did NOT Converge! Aborting as solve did not converge
ERROR The following error occurred in the TimeStepper 'ConstantDT' of type ConstantDT.
Solve failed and timestep already at or below dtmin, cannot continue!
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
It would be helpful if someone could point out the issue! Thank you