lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
225 stars 293 forks source link

Classy: Double free causing segmentation fault #128

Open carlosggarcia opened 7 years ago

carlosggarcia commented 7 years ago

Hi,

Working in hi_class with models where shooting fails quite often I encountered segmentation fault. I reproduce below the relevant part of the output error. This error was raised only (i don't really understand why) when, after a successful computation and posterior struct_cleanup, I computed with a different set of parameters for which shooting fails .

Error in `/usr/bin/python2.7': double free or corruption (out): 0x00000000013d35c0 ======= Backtrace: ========= /lib64/libc.so.6(+0x721d3)[0x7fda259c31d3] /lib64/libc.so.6(+0x77ac6)[0x7fda259c8ac6] /lib64/libc.so.6(+0x782ce)[0x7fda259c92ce] /home/ardok/.local/lib64/python2.7/site-packages/classy.so(background_free+0x10)[0x7fda24bd94f0]

In some of my trials and errors I also experienced these other two segmentation faults (they were far less common, though):

Error in `/usr/bin/python2.7': munmap_chunk(): invalid pointer: 0x0000000003215bc0 ======= Backtrace: ========= /lib64/libc.so.6(+0x721d3)[0x7f46d3f6b1d3] /lib64/libc.so.6(+0x77ac6)[0x7f46d3f70ac6] /home/ardok/.local/lib64/python2.7/site-packages/classy.so(background_free+0x10)[0x7f46c7d201d0]

Error in `/usr/bin/python2.7': free(): invalid size: 0x000000000210f8d0 ======= Backtrace: ========= /lib64/libc.so.6(+0x721d3)[0x7efcaa9d21d3] /lib64/libc.so.6(+0x77ac6)[0x7efcaa9d7ac6] /lib64/libc.so.6(+0x782ce)[0x7efcaa9d82ce] /home/ardok/.local/lib64/python2.7/site-packages/classy.so(background_free+0x10)[0x7efca28511d0]

I traced back the origin of the error and saw that the segmentation fault was caused in background_free() when calling:

 free(pba->tau_table);
 free(pba->z_table);
 free(pba->background_table);
 free(pba->d2background_dtau2_table);

I don't understand why free(pba->d2tau_dz2_table) caused no problem.

The point is that after a failed shooting, background_init() exits without initializing them. In fact, if in main/class.c we do:

 if (background_init(&pr,&ba) == _FAILURE_) {
      background_free(&ba);                                
    printf("\n\nError running background_init \n=>%s\n",ba.error_message);
    return _FAILURE_;
  }

valgrind reports

==22743== Conditional jump or move depends on uninitialised value(s) ==22743== at 0x4C2B186: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22743== by 0x46D508: background_free (background.c:636) ==22743== by 0x540473: main (class.c:27) ==22743== Uninitialised value was created by a stack allocation ==22743== at 0x54038A: main (class.c:7) ==22743== ==22743== Conditional jump or move depends on uninitialised value(s) ==22743== at 0x4C2B186: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22743== by 0x46D52E: background_free (background.c:638) ==22743== by 0x540473: main (class.c:27) ==22743== Uninitialised value was created by a stack allocation ==22743== at 0x54038A: main (class.c:7)
==22743== ==22743== Conditional jump or move depends on uninitialised value(s) ==22743== at 0x4C2B186: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22743== by 0x46D541: background_free (background.c:639) ==22743== by 0x540473: main (class.c:27)
==22743== Uninitialised value was created by a stack allocation ==22743== at 0x54038A: main (class.c:7)
==22743== ==22743== Conditional jump or move depends on uninitialised value(s) ==22743== at 0x4C2B186: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==22743== by 0x46D554: background_free (background.c:640) ==22743== by 0x540473: main (class.c:27)
==22743== Uninitialised value was created by a stack allocation ==22743== at 0x54038A: main (class.c:7) ==22743==

These errors confirmed my suspicions. In the present state of classy.pyx we have:

if "background" in level:
    if background_init(&(self.pr), &(self.ba)) == _FAILURE_:
        self.ncp.add("background")                          
        self.ready = True
        self.struct_cleanup()
        raise CosmoComputationError(self.ba.error_message)

so that every time background fails, no matter why, background structures are freed. In my case, since background structure was uninitialized due to a failed shooting, it caused segmentation fault. The solution I implemented and seems to work is checking whether the failure is caused by a failed shooting or not :

 python/cclassy.pxd
@@ -57,6 +57,7 @@ cdef extern from "class.h":
         double w0_fld
         double wa_fld
         double cs2_fld
+        short shooting_failed

         int bt_size
python/classy.pyx
@@ -323,6 +323,8 @@ cdef class Class:
         #NOTE: Added ready=true and add(perturb) at the steps to avoid memory leaks if code fails (MZ)
         if "background" in level:
             if background_init(&(self.pr), &(self.ba)) == _FAILURE_:
+                if self.ba.shooting_failed:
+                    raise CosmoComputationError(self.ba.error_message)
                 self.ncp.add("background")
                 self.ready = True
                 self.struct_cleanup()

I don't know if I should have made a pull request instead of issuing this. In case you think so, I will as soon as possible.

Thanks.

carlosggarcia commented 7 years ago

The above solution is not completely satisfactory since it leads to memory leakages. In particular, pba->parameters_smg (see https://github.com/miguelzuma/hi_class_public/blob/hi_class/source/background.c#L675) is still allocated. I suppose it is the same for pba->scf_parameters and the others.

The following may be correct but I haven't tested it with ncdm species nor scf.

python/cclassy.pxd
@@ -57,6 +57,7 @@ cdef extern from "class.h":
         double w0_fld
         double wa_fld
         double cs2_fld
+        short shooting_failed

         int bt_size

@@ -249,6 +250,7 @@ cdef extern from "class.h":
     void perturb_free(void*)
     void thermodynamics_free(void*)
     void background_free(void*)
+    void background_free_input(void*)
     void nonlinear_free(void*)

     cdef int _FAILURE_
python/classy.pyx
@@ -323,6 +323,9 @@ cdef class Class:
         #NOTE: Added ready=true and add(perturb) at the steps to avoid memory leaks if code fails (MZ)
         if "background" in level:
             if background_init(&(self.pr), &(self.ba)) == _FAILURE_:
+                if self.ba.shooting_failed:
+                    background_free_input(&self.ba)
+                    raise CosmoComputationError(self.ba.error_message)
                 self.ncp.add("background")
                 self.ready = True
                 self.struct_cleanup()

I hope it helps.