idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.67k stars 1.03k forks source link

problems with periodic BC fails in parallel work in serial #1197

Closed mooseframework closed 10 years ago

mooseframework commented 10 years ago

[gdb --args ../../pronghorn-opt -i periodic.i --n-threads=4 GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ................................ done

(gdb) r Starting program: /Users/bingaa/projects/trunk/pronghorn/pronghorn-opt -i periodic.i --n-threads=4 Reading symbols for shared libraries .+++++++++++++++++++++++++++++++. done Framework Information: SVN Revision: 11636 PETSc Version: 2.3.3 Current Time: Wed Jul 4 12:20:10 2012 Executable Timestamp: Tue Jul 3 16:28:45 2012

Antilocapra Americana


                                 PRONGHORN       

                Multiphysics Nuclear Reactor Analysis Code
                 for High Temperature Gas Cooled Reactors
                         Idaho National Laboratory
                             Idaho Falls, Idaho

Running file: periodic.i

Mesh Information: mesh_dimension()=3 spatial_dimension()=3 n_nodes()=8960 n_local_nodes()=8960 n_elem()=15147 n_local_elem()=15147 n_active_elem()=15147 n_subdomains()=1274 n_partitions()=1 n_processors()=1 n_threads()=4 processor_id()=0


Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000038 Switching to process 48347 0x00000001011c9e9a in libMesh::FEGenericBase::compute_periodic_constraints () (gdb) bt

0 0x00000001011c9e9a in libMesh::FEGenericBase::compute_periodic_constraints ()

1 0x00000001010bcd44 in tbb::internal::start_for<libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*>, (anonymous namespace)::ComputeConstraints, tbb::auto_partitioner>::execute ()

2 0x0000000101a34a37 in tbb::internal::custom_schedulertbb::internal::IntelSchedulerTraits::local_wait_for_all ()

3 0x0000000101a31e59 in tbb::internal::arena::process ()

4 0x0000000101a30587 in tbb::internal::market::process ()

5 0x0000000101a2dbad in tbb::internal::rml::private_worker::thread_routine ()

6 0x00007fff88340fd6 in _pthread_start ()

7 0x00007fff88340e89 in thread_start ()

(gdb)

mooseframework commented 10 years ago

I found this while running a simulation in Pronghorn, but re-wrote the problem to use only moose classes and still had problems.

simple diffusion problem is attached.

andrsd commented 10 years ago

1) running -opt version of the code through debugger is dumb. not only some parts of the code get optimized out, but and most importantly, you will skip a lots of asserts and debug #ifdef sections that would help you to diagnose your problem. with debug version, the error message is: src/fe/fe_map.C, line 1013, compiled Jul 2 2012 at 08:28:45 WARNING: inverse_map of physical point (x,y,z)=( 2.30533, 1.33098, 0.312705)is not on element.

the output is kind of garbled, since the printing macros do not use mutex to guard access to terminal...

2) the problem is replicable - the code runs fine in serial and with mpi, fails with threading 3) I know there is a threading issue in libMesh - I do not remember what exactly - could be this one...

mooseframework commented 10 years ago

Thank you for your professional critique of my mistake to run the debugger on the optimized compilation. It is duly noted.

friedmud commented 10 years ago

This should be fixed now... increasing priority so we can check it and close the ticket

friedmud commented 10 years ago

This input file doesn't work anymore... but I did try threading with some Periodic BC tests and everything seems to be working fine.

I'm closing this ticket. If you find that it's still a problem, please reopen it.