Closed mangerij closed 9 years ago
What is your input file?
This should work on our master branch here: https://bitbucket.org/mesoscience/ferret
The kernels on my working branch are commented out here but here is the input file:
[Mesh]
file = sphere_fine_medium_coarse_exodus.e
#uniform_refine=1
[]
[Variables]
[./polar_x]
order = FIRST
family = LAGRANGE
block='2'
[../]
[./polar_y]
order = FIRST
family = LAGRANGE
block='2'
[../]
[./polar_z]
order = FIRST
family = LAGRANGE
block='2'
[../]
[./potential_int]
order=FIRST
family = LAGRANGE
[../]
[./potential_ext]
order=FIRST
family = LAGRANGE
[../]
[]
[Kernels]
[./polar_electric_E]
type=PolarElectricEStrong
variable=potential_int
block='2'
# permittivity = 8.85*e-12
permittivity = 1
polar_x = polar_x
polar_y = polar_y
polar_z = polar_z
#implicit=false
[../]
[./diffusion_E]
type=Electrostatics
#permittivity = 3*8.85*e-12
permittivity = 4
variable=potential_int
block='1 2'
[../]
[./diffusion_E_Ext]
type=Electrostatics
#type=Diffusion
# permittivity = 8.85*e-12
permittivity = 1
variable=potential_ext
block='1 2'
[../]
[./polar_electric_px]
type=PolarElectricPStrong
variable=polar_x
potential_ext = potential_ext
potential_int = potential_int
component=0
#implicit=false
[../]
[./polar_electric_py]
type=PolarElectricPStrong
variable=polar_y
potential_ext = potential_ext
potential_int = potential_int
component=1
#implicit=false
[../]
[./polar_electric_pz]
type=PolarElectricPStrong
variable=polar_z
potential_ext = potential_ext
potential_int = potential_int
component=2
#implicit=false
[../]
[./polar_x_time]
type=TimeDerivative
variable=polar_x
[../]
[./polar_y_time]
type=TimeDerivative
variable=polar_y
[../]
[./polar_z_time]
type=TimeDerivative
variable=polar_z
[../]
[]
[BCs]
[./potential_ext_1]
type = NeumannBC
variable = potential_ext
boundary = '1'
value = 1.0
[../]
[./potential_ext_2]
type = NeumannBC
variable = potential_ext
boundary = '2'
value = -1.0
[../]
[./potential_ext_3]
type = NeumannBC
variable = potential_ext
boundary = '3'
value = 0.0
[../]
[./potential_ext_4]
type = NeumannBC
variable = potential_ext
boundary = '4'
value = 0.0
[../]
[./potential_ext_5]
type = NeumannBC
variable = potential_ext
boundary = '5'
value = 0.0
[../]
[./potential_ext_6]
type = NeumannBC
variable = potential_ext
boundary = '6'
value = 0.0
[../]
[./potential_int_1]
type = NeumannBC
variable = potential_int
boundary = '1'
value = 0
[../]
[./potential_int_2]
type = NeumannBC
variable = potential_int
boundary = '2'
value = 0
[../]
[./potential_int_3]
type = NeumannBC
variable = potential_int
boundary = '3'
value = 0
[../]
[./potential_int_4]
type = NeumannBC
variable = potential_int
boundary = '4'
value = 0
[../]
[./potential_int_5]
type = NeumannBC
variable = potential_int
boundary = '5'
value = 0
[../]
[./potential_int_6]
type = NeumannBC
variable = potential_int
boundary = '6'
value = 0
[../]
[]
[ICs]
active='polar_x_constic polar_y_constic polar_z_constic'
[./polar_x_constic]
type=ConstantIC
variable=polar_x
block = '2'
value=0.3
[../]
[./polar_y_constic]
type=ConstantIC
variable=polar_y
block = '2'
value=0.3
[../]
[./polar_z_constic]
type=ConstantIC
variable=polar_z
block = '2'
value=0.3
[../]
[]
[Preconditioning]
[./smp]
type=SMP
full=true #to use every off diagonal block
pc_side=left
[../]
[]
[Executioner]
#type = Steady
type=Transient
solve_type=newton
scheme=implicit-euler #"implicit-euler, explicit-euler, crank-nicolson, bdf2, rk-2"
dt=1e0
# nl_max_its=30
# l_max_its=10000
num_steps=120
#petsc_options="-snes_monitor -snes_converged_reason -ksp_monitor -ksp_converged_reason"
# petsc_options='-snes_monitor -snes_converged_reason -ksp_monitor -ksp_converged_reason'
petsc_options='-ksp_monitor_true_residual -snes_monitor -snes_view -snes_converged_reason -snes_linesearch_monitor -options_left'
petsc_options_iname='-gmres_restart -ksp_type -pc_type -snes_linesearch_type -pc_factor_zeropivot'
petsc_options_value='1000 gmres jacobi basic 1e-50'
#petsc_options_iname='-snes_rtol'
#petsc_options_value='1e-16'
[]
[Outputs]
file_base = outlin_die_sph_strong_implic_dt0_n80_er4_E0-1
output_initial = true
print_linear_residuals = true
print_perf_log = true
[./out]
type = Exodus
elemental_as_nodal = true
#output_nodal = true
[../]
[]
Holy Dooley!
We need to narrow this down, is the problem with the MOOSE, or your application? Have you tried running this mesh with just a Diffusion kernel and Dirichlet boundary conditions? That'll tell us if it's a mesh related problem. Then we can start added more variables and models to find the problem. Even with a stack trace we may not find the smoking gun.
Cody
On Tue, Mar 24, 2015 at 2:54 PM John notifications@github.com wrote:
This should work on our master branch here: https://bitbucket.org/mesoscience/ferret
The kernels on my working branch are commented out here but here is the input file:
[Mesh] file = sphere_fine_medium_coarse_exodus.e
uniform_refine=1
[] [Variables] [./polar_x] order = FIRST family = LAGRANGE block='2' [../] [./polar_y] order = FIRST family = LAGRANGE block='2' [../] [./polar_z] order = FIRST family = LAGRANGE block='2' [../] [./potential_int] order=FIRST family = LAGRANGE [../] [./potential_ext] order=FIRST family = LAGRANGE [../] []
[AuxVariables]
[./SurfCharge] order = FIRST family = LAGRANGE [../] [./Ex] order = CONSTANT family = MONOMIAL [../] [./Ey] order = CONSTANT family = MONOMIAL [../] [./Ez] order = CONSTANT family = MONOMIAL [../] [./depol_Ex] order = CONSTANT family = MONOMIAL [../] [./depol_Ey] order = CONSTANT family = MONOMIAL [../] [./depol_Ez] order = CONSTANT family = MONOMIAL [../]
[]
[Kernels] [./polar_electric_E] type=PolarElectricEStrong variable=potential_int block='2'
permittivity = 8.85
_e-12 permittivity = 1 polar_x = polar_x polar_y = polar_y polar_z = polar_z #implicit=false [../] [./diffusion_E] type=Electrostatics
permittivity = 3_8.85
_e-12 permittivity = 4 variable=potential_int block='1 2' [../] [./diffusion_E_Ext] type=Electrostatics #type=Diffusion # permittivity = 8.85_e-12 permittivity = 8.85*e-12 variable=potential_ext block='1 2' [../] [./polar_electric_px] type=PolarElectricPStrong variable=polar_x potential_ext = potential_ext potential_int = potential_int component=0
implicit=false
[../] [./polar_electric_py] type=PolarElectricPStrong variable=polar_y potential_ext = potential_ext potential_int = potential_int component=1
implicit=false
[../] [./polar_electric_pz] type=PolarElectricPStrong variable=polar_z potential_ext = potential_ext potential_int = potential_int component=2
implicit=false
[../] [./polar_x_time] type=TimeDerivative variable=polar_x [../] [./polar_y_time] type=TimeDerivative variable=polar_y [../] [./polar_z_time] type=TimeDerivative variable=polar_z [../] []
[AuxKernels]
[./surfacechargeaux] type = SurfaceChargeAux variable = SurfCharge boundary = '7' polar_x = polar_x polar_y = polar_y polar_z = polar_z [../] [./Ex_fieldAux] type = Ex_fieldAux variable = Ex potential_int = potential_int potential_ext = potential_ext [../] [./Ey_fieldAux] type = Ey_fieldAux variable = Ey potential_int = potential_int potential_ext = potential_ext [../] [./Ez_fieldAux] type = Ez_fieldAux variable = Ez potential_int = potential_int potential_ext = potential_ext [../] [./Depol_x_fieldAux] type = Depol_x_fieldAux variable = depol_Ex block = '2' potential_int = potential_int [../] [./Depol_y_fieldAux] type = Depol_y_fieldAux variable = depol_Ey block = '2' potential_int = potential_int [../] [./Depol_z_fieldAux] type = Depol_z_fieldAux variable = depol_Ez block = '2' potential_int = potential_int [../]
[]
[BCs] [./potential_ext_1] type = NeumannBC variable = potential_ext boundary = '1' value = 1.0 [../] [./potential_ext_2] type = NeumannBC variable = potential_ext boundary = '2' value = -1.0 [../] [./potential_ext_3] type = NeumannBC variable = potential_ext boundary = '3' value = 0.0 [../] [./potential_ext_4] type = NeumannBC variable = potential_ext boundary = '4' value = 0.0 [../] [./potential_ext_5] type = NeumannBC variable = potential_ext boundary = '5' value = 0.0 [../] [./potential_ext_6] type = NeumannBC variable = potential_ext boundary = '6' value = 0.0 [../] [./potential_int_1] type = NeumannBC variable = potential_int boundary = '1' value = 0 [../] [./potential_int_2] type = NeumannBC variable = potential_int boundary = '2' value = 0 [../] [./potential_int_3] type = NeumannBC variable = potential_int boundary = '3' value = 0 [../] [./potential_int_4] type = NeumannBC variable = potential_int boundary = '4' value = 0 [../] [./potential_int_5] type = NeumannBC variable = potential_int boundary = '5' value = 0 [../] [./potential_int_6] type = NeumannBC variable = potential_int boundary = '6' value = 0 [../] []
[ICs] active='polar_x_constic polar_y_constic polar_z_constic' [./polar_x_constic] type=ConstantIC variable=polar_x block = '2' value=0.3 [../] [./polar_y_constic] type=ConstantIC variable=polar_y block = '2' value=0.3 [../] [./polar_z_constic] type=ConstantIC variable=polar_z block = '2' value=0.3 [../] []
[Preconditioning] [./smp] type=SMP full=true #to use every off diagonal block pc_side=left [../] []
[Executioner]
type = Steady
type=Transient solve_type=newton scheme=implicit-euler #"implicit-euler, explicit-euler, crank-nicolson, bdf2, rk-2" dt=1e0
nl_max_its=30
l_max_its=10000
num_steps=120
petsc_options="-snes_monitor -snes_converged_reason -ksp_monitor
-ksp_converged_reason"
petsc_options='-snes_monitor -snes_converged_reason -ksp_monitor
-ksp_converged_reason' petsc_options='-ksp_monitor_true_residual -snes_monitor -snes_view -snes_converged_reason -snes_linesearch_monitor -options_left' petsc_options_iname='-gmres_restart -ksp_type -pc_type -snes_linesearch_type -pc_factor_zeropivot' petsc_options_value='1000 gmres jacobi basic 1e-50'
petsc_options_iname='-snes_rtol'
petsc_options_value='1e-16'
[]
[Debug]
show_parser = true
[]
[Outputs] file_base = outlin_die_sph_strong_implic_dt0_n80_er4_E0-1 output_initial = true print_linear_residuals = true print_perf_log = true [./out] type = Exodus elemental_as_nodal = true
output_nodal = true
[../] []
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-85709226.
Good call Cody,
Seems like it isn't a mesh related issue. Using this mesh file in the diffusion example worked for 1 and 16 processes.
I'm not sure why I'm not getting a back trace from gdb or valgrind however, just points to this error in MooseArray.h
MooseArray is aborting. Try adding a breakpoint at MPI_Abort and then running. You should be able to get a stack trace.
Cody
On Tue, Mar 24, 2015 at 3:47 PM John notifications@github.com wrote:
Good call Cody,
Seems like it isn't a mesh related issue. Using this mesh file in the diffusion example worked for 1 and 16 processes.
I'm not sure why I'm not getting a back trace from gdb however, just points to this error in MooseArray.h
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-85723015.
Yeah, I'm not sure what is going on. I tried that and it just says 'No stack.'
Well then you didn't hit the right breakpoint just yet. Try breaking on the error in MooseArray so you can halt the execution before it exits. On Tue, Mar 24, 2015 at 4:11 PM John notifications@github.com wrote:
Yeah, I'm not sure what is going on. I tried that and it just says 'No stack.'
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-85734739.
I tried:
break Assertion `i < _size' failed Access out of bounds in MooseArray (i: 0 size: 0)
break Access out of bounds in MooseArray (i: 0 size: 0)
break mooseAssert
break /home/john/projects/moose/framework/include/utils/MooseArray.h:289
break /home/john/projects/moose/framework/include/utils/MooseArray.h, line 289
break MPI_Abort
break MPI_abort
break MPI_ABORT
and nothing gave me a trace. I sorry, I'm a bit confused.
Let's be explicit here. If you do:
$ gdb --args ./your_app-method -i input_file.i
(gdb) break MPI_Abort
(gdb) run
<observed a crash>
(gdb) bt
you do not get any stack?
Yup. I just checked again just to be safe. No stack.
I forgot to mention, you used METHOD=dbg
(i.e. debug mode), right. Not opt...
So, I complied ferret master, generated the mesh using your script, used your input file, ran the thing with mpiexec -np 16
in both devel
and opt
and it just works...
While that may be true you wouldn't hit the assertion in those modes so there could still be a problem. By the way I double checked and was wrong about the breakpoint. It should be MPI_abort with a lower case "a". Try that instead.
Cody On Wed, Mar 25, 2015 at 3:53 PM David Andrs notifications@github.com wrote:
So, I complied ferret master, generated the mesh using your script, used your input file, ran the thing with mpiexec -np 16 in both devel and opt and it just works...
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-86244394.
@mangerij I can't reproduce this either. I'm guessing here, but perhaps there is broken build that causes memory corruption and triggers the assert? Sounds convoluted, and it probably is, but we need to figure out a way to reproduce it. Is the error message at the top of this thread complete? Is there any more information on where it fails?
It should be MPI_abort with a lower case "a".
Uppercase A as far as I can tell: http://www.mpich.org/static/docs/latest/www3/MPI_Abort.html
Oops, wrong one
On Thu, Mar 26, 2015 at 7:45 AM David Andrs notifications@github.com wrote:
It should be MPI_abort with a lower case "a".
Uppercase A as far as I can tell: http://www.mpich.org/static/docs/latest/www3/MPI_Abort.html
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-86545487.
Just verified that dbg mode works on my machine as well... I'd recommend to try a clean build first (as @karpeev suggested)
Yeah, I'll be doing that next week. Fresh install for petsc/libmesh/moose- new laptop.
I'm a bit confused as to why the mesh in question would run with the diffusion kernel and not with the added kernels in Ferret even if those kernels work on 16 mpi processes with different meshes on my machine or why the debugger is not giving a stack on the MPI_abort (I've tried all the suggestions above).
Have you verified that your application is valgrind clean? If you have access to clang you might try the Address Sanitizer.
On Thu, Mar 26, 2015 at 2:36 PM John notifications@github.com wrote:
Yeah, I'll be doing that next week. Fresh install for petsc/libmesh/moose- new laptop.
I'm a bit confused as to why the mesh in question would run with the diffusion kernel and not with the added kernels in Ferret even if those kernels work on 16 mpi processes with different meshes on my machine or why the debugger is not giving a stack on the MPI_abort (I've tried all the suggestions above).
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-86727835.
Can you try a quick:
$ cd <ferret>
$ make cleanall
$ make
and see what you get? (unless you already tried that). Possibly executing update_and_rebuild_libmesh.sh
script.
If you have access to clang you might try the Address Sanitizer.
I have address sanitizer active for devel mode and I did not see anything yesterday when I tried it for the first time. I still suspect broken build...
Another possibility is a broken CUBIT, but I would start by rebuilding everything and making sure Ferret is valgrind-clean.
On Thu, Mar 26, 2015 at 4:42 PM David Andrs notifications@github.com wrote:
If you have access to clang you might try the Address Sanitizer.
I have address sanitizer active for devel mode and I did not see anything yesterday when I tried it for the first time. I still suspect broken build...
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-86729421.
Reopen if you are able to repeat this error or can submit us a test case. Thanks
Yeah I plan on it. This is on my to-do list Thanks :)
On Mon, May 4, 2015 at 6:20 PM, Cody Permann notifications@github.com wrote:
Reopen if you are able to repeat this error or can submit us a test case. Thanks
— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4852#issuecomment-98867530.
Make any multi-block mesh in CUBIT, mesh it with tets, and then use 'refine volume # depth #' once, save the mesh and attempt to run this with more than 8 mpi processes and you will get a seg fault 11 that points to this:
Interestingly enough, this mesh can be computing on for one mpi process. Not entirely sure how to upload the mesh file in question but I'm essentially using the following simple .jou commands:
Note that removing the line refine volume 2 depth 2 allows this mesh file to be run with 16 mpi processes. heal analyze also shows 100% quality for the mesh. No negative jacobians, ect.