neperfepx / neper

Polycrystal generation and meshing
http://neper.info
GNU General Public License v3.0
210 stars 53 forks source link

Memory error in ut_sys_runwtime #19

Closed rquey closed 4 years ago

rquey commented 5 years ago

A memory error always arises (but never gets Neper to fail) in ut_sys_runwtime. It can be seen using, e.g., valgrind.

First, Neper must be compiled in debugging mode:

$ cmake -DDEVEL_DEBUGGING_FLAG=ON -DDEVEL_OPTIMIZATION=OFF ..
$ make

Then, the error can be seen using

$ neper -T -n 1
$ valgrind neper -M n1-id1.tess

which yields

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.4.1-48                                 
Info   : Built with: gsl nlopt libscotch openmp      
Info   : Running on 8 threads.                       
Warning: Built with: no optimization.            NOT FOR PRODUCTION USE.
Warning: Built with: debugging compilation flag. NOT FOR PRODUCTION USE.
Info   : <http://neper.sourceforge.net>
Info   : Copyright (C) 2003-2019, and GNU GPL'd, by Romain Quey.      
Info   : Comments and bug reports: <neper-users@lists.sourceforge.net>.      
Info   : Loading initialization file `/home/rquey/.neperrc'...           
Info   : ---------------------------------------------------------------
Info   : MODULE  -M loaded with arguments:
Info   : [ini file] -order 1 -gmsh /home/rquey/bin/gmsh -tmp /home2/tmp
Info   : [com line] n1-id1.tess
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   :   - Reading arguments...
Info   : Loading input data...
Info   :   - Loading tessellation...
Info   :     [i] Parsing file `n1-id1.tess'...
Info   :     [i] Parsed file `n1-id1.tess'.
Info   : Meshing...
Info   :   - Preparing... (cl = 0.5) 100%
Info   :   - 0D meshing... 100%
Info   :   - 1D meshing... 100%
Info   :   - 2D meshing... 100% (0.89|0.89/ 0%| 0%|100%)
Info   :   - Fixing 2D-mesh pinches...
Info   :   - 3D meshing... ==27456== Thread 6:
==27456== Syscall param setitimer(&value->it_value) points to uninitialised byte(s)
==27456==    at 0x65E0E57: setitimer (syscall-template.S:78)
==27456==    by 0x349E8C: ut_sys_runwtime (ut_sys.c:76)
==27456==    by 0x227125: nem_mesh_3d_gmsh (nem_mesh_gmsh1.c:105)
==27456==    by 0x217D6F: nem_meshing_3D_poly_algo (nem_meshing_3D3.c:30)
==27456==    by 0x21765E: nem_meshing_3D_poly (nem_meshing_3D2.c:29)
==27456==    by 0x21726A: nem_meshing_3D._omp_fn.0 (nem_meshing_3D1.c:46)
==27456==    by 0x5EBB95D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==27456==    by 0x62F36DA: start_thread (pthread_create.c:463)
==27456==    by 0x662C88E: clone (clone.S:95)
==27456==  Address 0x9701870 is on thread 6's stack
==27456==  in frame #1, created by ut_sys_runwtime (ut_sys.c:58)
==27456==
100% (0.89|0.89/100%| 0%| 0%)
Info   : Searching nsets...
Info   : Writing mesh results...
Info   :   - Preparing mesh...
Info   :   - Mesh properties:
Info   :     > Node number:       52
Info   :     > Elt  number:      148
Info   :     > Mesh volume:    1.000
Info   :   - Writing mesh...
Info   :     [o] Writing file `n1-id1.msh'...
Info   :     [o] Wrote file `n1-id1.msh'.
Info   : Elapsed time: 2.982 secs.
========================================================================
jcappola commented 4 years ago

@rquey: Here is a gdb backtrace of the error:

#0  0x0000000006a71e57 in setitimer () at ../sysdeps/unix/syscall-template.S:78
#1  0x00000000003050ba in ut_sys_runwtime (exec=0x75e1130 "gmsh", command=0xd621270 
"gmsh -3 -v 0 -order 1 ./tmp29440-8.geo -o ./tmp29440-8.msh > /dev/null 2> /dev/null", 
t=1.1075895862662456e-315, 
pctrlc_t=0x1ffeffe7f0) at /home/nerc10083/code/neper-3.4.0/src/contrib/ut/ut_sys/ut_sys.c:77
#2  0x000000000021e208 in nem_mesh_3d_gmsh (Tess=..., poly=1, Nodes=..., Mesh=0x75dfcb0, cl=0.5, clreps=0.02, gmsh=0x75e1130 "gmsh", tmp=0x75e1180 ".", algo=0xd5f3ed0 "netg", opti=0xd5f3f20 "gmsh", rnd=0, 
allowed_t=1.1075895862662456e-315, pN=0xb9bd760, pM=0xb9bd700, pacl=0xb9bd6f8, pctrlc_t=0x1ffeffe7f0, pelapsed_t=0xb9bd6f0)
at /home/nerc10083/code/neper-3.4.0/src/neper_m/nem/nem_mesh_gmsh/nem_mesh_gmsh1.c:105
#3  0x000000000020ee16 in nem_meshing_3D_poly_algo (In=..., cl=0.5, mesh3dclreps=0.02, pMultim=0x1ffeffe800, algo=0, pctrlc_t=0x1ffeffe7f0, pallowed_t=0x1ffeffe7c8, pmax_elapsed_t=0x1ffeffe7d0, Tess=..., 
Nodes=..., Mesh=0x75dfcb0, poly=1, pN=0xb9bd760, pM=0xb9bd700, pmOsize=0xb9bd6f8, pelapsed_t=0xb9bd6f0) at /home/nerc10083/code/neper-3.4.0/src/neper_m/nem_meshing/nem_meshing_3D/nem_meshing_3D3.c:30
#4  0x000000000020e705 in nem_meshing_3D_poly (In=..., cl=0.5, mesh3dclreps=0.02, pMultim=0x1ffeffe800, pctrlc_t=0x1ffeffe7f0, pallowed_t=0x1ffeffe7c8, pmax_elapsed_t=0x1ffeffe7d0, Tess=..., 
pNodes=0x1ffefff730, Mesh=0x75dfcb0, pN=0xd5f1c38, pM=0xd5f1d58, poly=1) at /home/nerc10083/code/neper-3.4.0/src/neper_m/nem_meshing/nem_meshing_3D/nem_meshing_3D2.c:29
#5  0x000000000020e311 in nem_meshing_3D._omp_fn.0 () at /home/nerc10083/code/neper-3.4.0/src/neper_m/nem_meshing/nem_meshing_3D/nem_meshing_3D1.c:46
#6  0x000000000634c96e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#7  0x00000000067846db in start_thread (arg=0xb9bf700) at pthread_create.c:463
#8  0x0000000006abd88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Interestingly, we have allowed_t=1.1075895862662456e-315 being moved about here and the pointer pallowed_t seems to be uninitialized for 3D meshing. In the 2D meshing routine (nem_meshing_2D1.c), the double allowed_t is declared, but not assigned any value until allowed_t = In.mesh2dmaxtime; which gives it the value 10000. For whatever reason, getting allowed_t to accept a value (preferably In.mesh3dmaxtime) in nem_meshing_3D1.c doesn't seem to want to work. I'm unsure why allowed_t refuses to accept a value in the 3D meshing code, but perhaps you can get it to work.

No guarantees this is the issue, but its the only thing that jumps out at me as an issue from valgrind.