neperfepx / neper

Polycrystal generation and meshing
http://neper.info
GNU General Public License v3.0
198 stars 53 forks source link

Intermittent Tesselation Module Crashes #155

Open darrencpagan opened 3 years ago

darrencpagan commented 3 years ago

When running the tesselation module, the following command will cause a crash. The command was run in a loop from id=1:100 and crashes occurred when id=24 and id=50. Neper Version 4.1.3-2.

neper -T -n 1500 -id 24 -reg 1 -rsel 1.25 -mloop 4 \ -morpho "diameq:1,1-sphericity:lognormal(0.145,0.03)" -morphooptistop val=1e-2 \ -oricrysym "cubic" \ -domain "cube(1,1,3)" \ -format "tess" \ -o simulation \ -statcell ncells

rquey commented 3 years ago

It works here (4.1.3-6):

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-6
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt
Info   : Running on 8 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : Loading initialization file `/home/rquey/.neperrc'...
Info   : ---------------------------------------------------------------
Info   : MODULE  -T loaded with arguments:
Info   : [ini file] -tesrformat ascii
Info   : [com line] -n 1500 -id 24 -reg 1 -rsel 1.25 -mloop 4 -morpho
         diameq:1,1-sphericity:lognormal(0.145,0.03) -morphooptistop
         val=1e-2 -oricrysym cubic -domain cube(1,1,3) -format tess -o
         simulation -statcell ncells
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   : Creating domain...
Info   : Creating tessellation...
Info   :   - Setting seeds... 100%
Info   :   - Generating crystal orientations...
Info   :   - Running tessellation...
Info   :     > Initial solution: f   =0.910306615
Info   :     > Iteration 149558: fmin=0.009998316 f=0.009998316  
Info   :     > Reached `val' criterion.
Info   : Regularizing tessellation...
Info   :   - loop 4/4: 100% del=3265
Info   : Writing results...
Info   :     [o] Writing file `simulation.tess'...
Info   :     [o] Wrote file `simulation.tess'.
Info   : Writing statistics...
Info   :     [o] Writing file `simulation.stcell'...
Info   :     [o] Wrote file `simulation.stcell'.
Info   : Elapsed time: 277.503 secs.
========================================================================

What kind of error did you get?

darrencpagan commented 3 years ago

Made a mistake in the failure reporting, the error was actually at the meshing step. Again, the failure is only with some seeds.

Input Mesh command with the tesselation command from above: neper -M simulation.tess -order 2 -rcl 1.0 -part 32 -format "msh,vtk"

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-2
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 80 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : No initialization file found (`/home/faculty/dcp5303/.neperrc').
Info   : ---------------------------------------------------------------
Info   : MODULE  -T loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] -n 1500 -id 24 -reg 1 -rsel 1.25 -mloop 4 -morpho
         diameq:1,1-sphericity:lognormal(0.145,0.03) -morphooptistop
         val=1e-2 -oricrysym cubic -domain cube(1,1,3) -format tess -o
         simulation -statcell ncells
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   : Creating domain...
Info   : Creating tessellation...
Info   :   - Setting seeds... 100%
Info   :   - Generating crystal orientations...
Info   :   - Running tessellation...
Info   :     > Initial solution: f   =0.910306615
Info   :     > Iteration 149558: fmin=0.009998316 f=0.009998316
Info   :     > Reached `val' criterion.
Info   : Regularizing tessellation...
Info   :   - loop 4/4: 100% del=3265
Info   : Writing results...
Info   :     [o] Writing file `simulation.tess'...
Info   :     [o] Wrote file `simulation.tess'.
Info   : Writing statistics...
Info   :     [o] Writing file `simulation.stcell'...
Info   :     [o] Wrote file `simulation.stcell'.
Info   : Elapsed time: 434.115 secs.
========================================================================

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-2
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 80 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : No initialization file found (`/home/faculty/dcp5303/.neperrc').
Info   : ---------------------------------------------------------------
Info   : MODULE  -M loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] simulation.tess -order 2 -rcl 1.0 -part 32 -format
         msh,vtk
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   :   - Reading arguments...
Info   : Loading input data...
Info   :   - Loading tessellation...
Info   :     [i] Parsing file `simulation.tess'...
Info   :     [i] Parsed file `simulation.tess'.
Info   : Meshing...
Info   :   - Preparing... (cl = 0.063) 100%
Info   :   - 0D meshing... 100%
Info   :   - 1D meshing... 100%
Info   :   - 2D meshing...  11% (0.5|0.85/89%| 9%| 2%)Segmentation fault (core dumped)
rquey commented 3 years ago
Info   : Running on 80 threads.

Lucky man :)

I can't reproduce this bug either:

$ neper -M simulation.tess -order 2 -rcl 1.0

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-6
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt
Info   : Running on 8 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : Loading initialization file `/home/rquey/.neperrc'...
Info   : ---------------------------------------------------------------
Info   : MODULE  -M loaded with arguments:
Info   : [ini file] -gmsh /home/rquey/bin/gmsh -tmp /home2/tmp
Info   : [com line] simulation.tess -order 2 -rcl 1.0
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   :   - Reading arguments...
Info   : Loading input data...
Info   :   - Loading tessellation...
Info   :     [i] Parsing file `simulation.tess'...
Info   :     [i] Parsed file `simulation.tess'.
Info   : Meshing...
Info   :   - Preparing... (cl = 0.063) 100%
Info   :   - 0D meshing... 100%
Info   :   - 1D meshing... 100%
Info   :   - 2D meshing... 100% (0.049|0.83/84%|12%| 4%)
Info   :   - Fixing 2D-mesh pinches...
Info   :   - 3D meshing... 100% (0.65|0.89/100%| 0%| 0%)
Info   :   - Switching mesh to order 2...
Info   : Searching nsets and fasets...
Info   : Writing mesh results...
Info   :   - Preparing mesh...
Info   :   - Mesh properties:
Info   :     > Node number:   242896
Info   :     > Elt  number:   173197
Info   :     > Mesh volume:    3.000
Info   :   - Writing mesh...
Info   :     [o] Writing file `simulation.msh'...
Info   :     [o] Wrote file `simulation.msh'.
Info   : Elapsed time: 329.015 secs.
========================================================================

Can you try with the latest version, 4.1.3-6? I don't think that it should make a difference, but if it still fails, can you try to reduce the number of threads (export OMP_NUM_THREADS=8)?

rquey commented 3 years ago

@darrencpagan Do you have new inputs for me?

darrencpagan commented 3 years ago

Same error with 4.3.1-10 and 8 threads:


========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-10
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 8 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : No initialization file found (`/home/faculty/dcp5303/.neperrc').
Info   : ---------------------------------------------------------------
Info   : MODULE  -T loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] -n 1500 -id 24 -reg 1 -rsel 1.25 -mloop 4 -morpho
         diameq:1,1-sphericity:lognormal(0.145,0.03) -morphooptistop
         val=1e-2 -oricrysym cubic -domain cube(1,1,3) -format tess -o
         simulation -statcell ncells
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   : Creating domain...
Info   : Creating tessellation...
Info   :   - Setting seeds... 100%
Info   :   - Generating crystal orientations...
Info   :   - Running tessellation...
Info   :     > Initial solution: f   =0.910306615
Info   :     > Iteration 149558: fmin=0.009998316 f=0.009998316
Info   :     > Reached `val' criterion.
Info   : Regularizing tessellation...
Info   :   - loop 4/4: 100% del=3265
Info   : Writing results...
Info   :     [o] Writing file `simulation.tess'...
Info   :     [o] Wrote file `simulation.tess'.
Info   : Writing statistics...
Info   :     [o] Writing file `simulation.stcell'...
Info   :     [o] Wrote file `simulation.stcell'.
Info   : Elapsed time: 251.183 secs.
========================================================================

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 4.1.3-10
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 8 threads.
Info   : <https://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
Info   : No initialization file found (`/home/faculty/dcp5303/.neperrc').
Info   : ---------------------------------------------------------------
Info   : MODULE  -M loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] simulation.tess -order 2 -rcl 1.0 -part 32 -format
         msh,vtk
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   :   - Reading arguments...
Info   : Loading input data...
Info   :   - Loading tessellation...
Info   :     [i] Parsing file `simulation.tess'...
Info   :     [i] Parsed file `simulation.tess'.
Info   : Meshing...
Info   :   - Preparing... (cl = 0.063) 100%
Info   :   - 0D meshing... 100%
Info   :   - 1D meshing... 100%
Info   :   - 2D meshing...  12% (0.35|0.85/90%| 8%| 2%)Segmentation fault (core dumped)
rquey commented 3 years ago

I still can't reproduce this bug... Please try to run Neper through valgrind:

$ cmake .. -DDEVEL_DEBUGGING_FLAG=ON -DDEVEL_OPTIMIZATION=OFF
$ make

then (using the new neper, of course)

$ valgrind neper -M simulation.tess -order 2 -rcl 1.0 -part 32 -format msh,vtk

This does not report any error for me, but it should for you, since you are getting a seg fault...

darrencpagan commented 3 years ago
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   :   - Reading arguments...
Info   : Loading input data...
Info   :   - Loading tessellation...
Info   :     [i] Parsing file `simulation.tess'...
Info   :     [i] Parsed file `simulation.tess'.
Info   : Meshing...
Info   :   - Preparing... (cl = 0.063) 100%
Info   :   - 0D meshing... 100%
Info   :   - 1D meshing... 100%
Info   :   - 2D meshing...  11% (0.5|0.85/89%| 9%| 2%)==26686== Thread 40:
==26686== Invalid read of size 4
==26686==    at 0x212E9C: nem_meshing_2D_face_mesh_gmsh_backproj_fixboundary (nem_meshing_2D_face_mesh_gmsh3.c:24)
==26686==    by 0x2123DF: nem_meshing_2D_face_mesh_gmsh_backproj (nem_meshing_2D_face_mesh_gmsh2.c:160)
==26686==    by 0x2114CB: nem_meshing_2D_face_mesh_gmsh (nem_meshing_2D_face_mesh_gmsh1.c:167)
==26686==    by 0x20FA92: nem_meshing_2D_face_mesh (nem_meshing_2D3.c:36)
==26686==    by 0x20EC84: nem_meshing_2D_face (nem_meshing_2D2.c:67)
==26686==    by 0x20E1B8: nem_meshing_2D._omp_fn.0 (nem_meshing_2D1.c:54)
==26686==    by 0x656E96D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==26686==    by 0x59786DA: start_thread (pthread_create.c:463)
==26686==    by 0x6AC071E: clone (clone.S:95)
==26686==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==26686==
==26686==
==26686== Process terminating with default action of signal 11 (SIGSEGV)
==26686==  Access not within mapped region at address 0x0
==26686==    at 0x212E9C: nem_meshing_2D_face_mesh_gmsh_backproj_fixboundary (nem_meshing_2D_face_mesh_gmsh3.c:24)
==26686==    by 0x2123DF: nem_meshing_2D_face_mesh_gmsh_backproj (nem_meshing_2D_face_mesh_gmsh2.c:160)
==26686==    by 0x2114CB: nem_meshing_2D_face_mesh_gmsh (nem_meshing_2D_face_mesh_gmsh1.c:167)
==26686==    by 0x20FA92: nem_meshing_2D_face_mesh (nem_meshing_2D3.c:36)
==26686==    by 0x20EC84: nem_meshing_2D_face (nem_meshing_2D2.c:67)
==26686==    by 0x20E1B8: nem_meshing_2D._omp_fn.0 (nem_meshing_2D1.c:54)
==26686==    by 0x656E96D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==26686==    by 0x59786DA: start_thread (pthread_create.c:463)
==26686==    by 0x6AC071E: clone (clone.S:95)
==26686==  If you believe this happened as a result of a stack
==26686==  overflow in your program's main thread (unlikely but
==26686==  possible), you can try to increase the size of the
==26686==  main thread stack using the --main-stacksize= flag.
==26686==  The main thread stack size used in this run was 8388608.
==26686==
==26686== HEAP SUMMARY:
==26686==     in use at exit: 12,061,341 bytes in 319,057 blocks
==26686==   total heap usage: 17,914,578 allocs, 17,595,521 frees, 5,512,750,578 bytes allocated
==26686==
==26686== LEAK SUMMARY:
==26686==    definitely lost: 240 bytes in 5 blocks
==26686==    indirectly lost: 0 bytes in 0 blocks
==26686==      possibly lost: 25,280 bytes in 79 blocks
==26686==    still reachable: 12,035,821 bytes in 318,973 blocks
==26686==         suppressed: 0 bytes in 0 blocks
==26686== Rerun with --leak-check=full to see details of leaked memory
==26686==
==26686== For counts of detected and suppressed errors, rerun with: -v
==26686== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)