lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
230 stars 285 forks source link

Omega-dcdmdr, resulting in seg faults #409

Closed astrogirl1 closed 3 years ago

astrogirl1 commented 3 years ago

Hi,

I recently pulled the class repo which had major changes to background.c and input.c and since then I have been getting seg faults when i previously did not. This happens when I input in 'Omega_dcdmr', 'Omega_ini_dcdm' and 'Omega0_dcdmdr' through MontePython.

I checked input.c and these variables are used in various places. I added these lines (in line 2466), (which I originally added for my own new parameters, which worked), but I still get the seg faults.

class_read_double("Omega0_dcdmdr"0,pba->Omega0_dcdmdr);
class_read_double("Omega_ini_dcdm",pba->Omega_ini_dcdm);

I need these variables to be non-zero/varying and thus to trigger these equations in background.c, reference to my old issue which worked previously #407.

Here is part of the error :

 /!\ PyMultiNest detected but MultiNest likely not installed correctly. You can
     safely ignore this if not running with option -m NS

Error in Class: background_init(L:770) :condition (pba->shooting_failed == _TRUE_) is true; Shooting failed, try optimising input_get_guess(). Error message:

input_shooting(L:661) :error in input_find_root(&xzero, &fevals, ppr->tol_shooting_deltax_rel, &fzw, errmsg);
=>input_find_root(L:839) :error in input_fzerofun_1d(x1, pfzw, &f1, errmsg);
=>input_fzerofun_1d(L:912) :error in input_try_unknown_parameters(&input, 1, pfzw, output, error_message);
=>input_try_unknown_parameters(L:1275) :error in background_init(&pr,&ba);
=>background_init(L:785) :error in background_solve(ppr,pba);
=>background_solve(L:1894) :error in generic_evolver(background_derivs, loga_ini, loga_final, pvecback_integration, used_in_output, pba->bi_size, &bpaw, ppr->tol_background_integration, ppr->smallest_allowed_variation, background_timescale, ppr->background_integration_stepsize, pba->loga_table, pba->bt_size, background_sources, ((void *)0), pba->error_message);
=>evolver_ndf15(L:297) :error in new_linearisation(&jac,hinvGak,neq,error_message);
=>new_linearisation(L:999) :condition (funcreturn == _FAILURE_) is true; Failure in ludcmp. Possibly singular matrix!
[cssm01:25753] *** Process received signal ***
[cssm01:25753] Signal: Segmentation fault (11)
[cssm01:25753] Signal code:  (128)
[cssm01:25753] Failing at address: (nil)
[cssm01:25753] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7f0dad7f4fd0]
[cssm01:25753] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)[0x7f0dad84d9fd]
[cssm01:25753] [ 2] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_read_parameters_general+0x921)[0x7f0d93a21f31]
[cssm01:25753] [ 3] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_read_parameters+0x10e)[0x7f0d93a39b5e]
[cssm01:25753] [ 4] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_try_unknown_parameters+0x17a)[0x7f0d93a3a76a]
[cssm01:25753] [ 5] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_fzerofun_1d+0x29)[0x7f0d93a3b379]
[cssm01:25753] [ 6] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_find_root+0x69)[0x7f0d93a3b419]
[cssm01:25753] [ 7] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_shooting+0x954)[0x7f0d93a3bfe4]
[cssm01:25753] [ 8] /home/a1705053/Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(input_read_from_file+0x130)[0x7f0d93a3c400]
[cssm01:25753] [ 9] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(+0x1cf2e)[0x7f0d93978f2e]
[cssm01:25753] [10] /home//Desktop/git_code/class_public/python/build/lib.linux-x86_64-2.7/classy.so(+0x1bf95)[0x7f0d93977f95]
[cssm01:25753] [11] python(PyEval_EvalFrameEx+0x56a)[0x559d8d2ff73a]
[cssm01:25753] [12] python(PyEval_EvalCodeEx+0x4ba)[0x559d8d2fcd5a]
[cssm01:25753] [13] python(PyEval_EvalFrameEx+0x5c6c)[0x559d8d304e3c]
[cssm01:25753] [14] python(PyEval_EvalCodeEx+0x4ba)[0x559d8d2fcd5a]
[cssm01:25753] [15] python(PyEval_EvalFrameEx+0x5c6c)[0x559d8d304e3c]
[cssm01:25753] [16] python(PyEval_EvalCodeEx+0x4ba)[0x559d8d2fcd5a]
[cssm01:25753] [17] python(PyEval_EvalFrameEx+0x5c6c)[0x559d8d304e3c]
[cssm01:25753] [18] python(PyEval_EvalCodeEx+0x4ba)[0x559d8d2fcd5a]
[cssm01:25753] [19] python(PyEval_EvalFrameEx+0x569e)[0x559d8d30486e]
[cssm01:25753] [20] python(PyEval_EvalCodeEx+0x4ba)[0x559d8d2fcd5a]
[cssm01:25753] [21] python(PyEval_EvalCode+0x19)[0x559d8d2fc899]
[cssm01:25753] [22] python(+0x1220ef)[0x559d8d32d0ef]
[cssm01:25753] [23] python(PyRun_FileExFlags+0x82)[0x559d8d3282f2]
[cssm01:25753] [24] python(PyRun_SimpleFileExFlags+0x18d)[0x559d8d327ded]
[cssm01:25753] [25] python(Py_Main+0x579)[0x559d8d2d6c99]
[cssm01:25753] [26] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f0dad7d7b97]
[cssm01:25753] [27] python(_start+0x2a)[0x559d8d2d662a]
[cssm01:25753] *** End of error message ***
Creating chains/14_04_mcmc/2021-04-14_10__4.txt

Error in Class: background_init(L:770) :condition (pba->shooting_failed == _TRUE_) is true; Shooting failed, try optimising input_get_guess(). Error message:

input_shooting(L:661) :error in input_find_root(&xzero, &fevals, ppr->tol_shooting_deltax_rel, &fzw, errmsg);
=>input_find_root(L:839) :error in input_fzerofun_1d(x1, pfzw, &f1, errmsg);
=>input_fzerofun_1d(L:912) :error in input_try_unknown_parameters(&input, 1, pfzw, output, error_message);
=>input_try_unknown_parameters(L:1275) :error in background_init(&pr,&ba);
=>background_init(L:785) :error in background_solve(ppr,pba);
=>background_solve(L:1894) :error in generic_evolver(background_derivs, loga_ini, loga_final, pvecback_integration, used_in_output, pba->bi_size, &bpaw, ppr->tol_background_integration, ppr->smallest_allowed_variation, background_timescale, ppr->background_integration_stepsize, pba->loga_table, pba->bt_size, background_sources, ((void *)0), pba->error_message);
=>evolver_ndf15(L:297) :error in new_linearisation(&jac,hinvGak,neq,error_message);
=>new_linearisation(L:999) :condition (funcreturn == _FAILURE_) is true; Failure in ludcmp. Possibly singular matrix!
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cssm01 exited on signal 11 (Segmentation fault)

Thank you, Meera

pstoecker commented 3 years ago

Hey Meera,

is this an issue that only occurs within Montepython (i.e. with a special combination of the parameters you are varying) or does it also happen if you run class alone?

In either case it would be good if you can find out which parameters were fed to class as this would make debugging easier.

Judging from the backtrace of your segfault and the error message, it appears that you access a nullpointer within the function new_linearisation of the ndf15 solver. This can happen when the system of differential equations that is passed to the ndf15 solver is badly defined and a relevant pointer is not set properly. I would suspect that the issue can be found there by checking the background_derivs passed to generic_solver (in your case ndf15)

Best, Patrick

PS: One thing which is puzzling is that you get a proper error message (singular matrix -> ill-defined differential equation) and a segfault. So it seems that class attempts to throw an error, that is interpreted by MP as invalid point and it would proceed with the next point, but in doing so it also triggers the segfault which results in the crash of your MontePython run

astrogirl1 commented 3 years ago

Hi Patrick,

Thanks for your response. Here are a few more details that are leading to those issues:

1) I would like to use the background equations as mentioned above, so I need _hasdcdm to be TRUE. For this, looking at the code, I need _Omega0dcdmdr to be non zero. Initially before the merge it was a different parameter _Omegadcdm and it worked. So if I test this only in class using explanatory.ini, _Omega0dcdmdr never actually gets used, and the equations are not triggered. I checked the '_unusedparameters' and its listed in there.

1-a) So with this in my mind, I tried using MontePython and got errors as mentioned^ where it said ' CLASS cannot read the parameters : Omega0_dcdmdr '. I added a line to the data.py file in MP, an elif statement, (I used this when I introduced new parameters) and since then CLASS has not spit any errors but has not triggered the equations either.

           elif elem == 'log10_frac_rm_energy':
                self.cosmo_arguments['frac_rm_energy'] = math.pow(10.0, self.cosmo_arguments[elem])
                del self.cosmo_arguments[elem]
            elif elem == 'log10_new_tau_var':
                self.cosmo_arguments['new_tau_var'] = math.pow(10.0, self.cosmo_arguments[elem])
                del self.cosmo_arguments[elem]
            elif elem == 'Omega0_dcdmdr':
                self.cosmo_arguments['Omega0_dcdmdr'] = self.cosmo_arguments[elem]
                del self.cosmo_arguments[elem]

i have also added 'class_read_double('frac_rm_energy', pba->frac_rm_energy)' in input.c so that it reads my variable.

2) The new variable added in the new version of class, _taudcdm, is exactly what I introduced months ago and its great that its now in built. But when put into MontePython, CLASS fails to read it again. I tried adding a new line in the data.py like above ^^ but that did not solve the problem.

It seems like there is something missing that is needed for tau_dcdm's definition but I cannot tell :( By introducing _Omega0dcdmdr I have not had any seg fault issues but I do get shooting errors which i didnt previously.

I might have to go back to the old version :( I appreciate any help I can get!

Thank you, Meera

Here are the parameters I am passing through MP : (new_tau_var is essentially tau_dcdm before it tau_dcdm)

data.parameters['omega_b']  = [2.249,  None,None, 0, 0.01,'cosmo']
data.parameters['Omega_cdm']    = [0.22, 0.1, 0.5, 0.0016,1,   'cosmo']
data.parameters['n_s']          = [0.963, None,None, 0, 1,   'cosmo']
data.parameters['A_s']          = [2.42,   None,None, 0, 1e-9,'cosmo']
data.parameters['h']            = [0.72,  0.64,0.87, 0.0036,1,   'cosmo']
#data.parameters['tau_reio']        = [0.085,  None,None, 0,1,   'cosmo']
data.parameters['log10_frac_rm_energy'] = [-2,-4,-0.3,0.55,1,'cosmo']
data.parameters['log10_new_tau_var'] = [2,-3,4,0.25,1,'cosmo']
data.parameters['Omega0_dcdmdr'] =  [0.33,None,None, 0,1,   'cosmo']
astrogirl1 commented 3 years ago

Hi,

I think I have resolved the seg fault issue, I think you are right Patrick when you said: 'So it seems that class attempts to throw an error, that is interpreted by MP as invalid point and it would proceed with the next point, but in doing so it also triggers the segfault which results in the crash of your MontePython run'

So I might open a new one for the shooting error.

Thank you, Meera

alessiospuriomancini commented 3 years ago

Hi @astrogirl1 , could you elaborate on how you fixed the seg fault issue? I am having the same problem with another modified version of Class. Thanks!

astrogirl1 commented 3 years ago

Hi @alessiospuriomancini, looking back at my notes, I think there was an issue with how I was defining my parameters in class (input.c). There was also an issue of providing Omega0_dcdmdr in my input file - I dont think that is the right way to do it, class reads either Omega_dcdmdr / Omega_ini_dcdm and then assigns it to Omega0_dcdmdr in the code itself. So if you are making modifications, try checking your definitions? Is there a specific change causing the issue? Meera