lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
230 stars 285 forks source link

parameter set causing "Segmentation fault" in python wrapper #419

Open lukashergt opened 3 years ago

lukashergt commented 3 years ago

The following parameter set causes a Segmentation fault in the 'background' module (both python wrapper and command line call to CLASS):

Example python script to reproduce the error:

from classy import Class

paramdict = {'A_s': 1.9277426921445484e-09, 
             'n_s': 0.9451570080189712, 
             '100*theta_s': 1.045577375117725, 
             'omega_b': 0.017612175306787197, 
             'omega_cdm': 0.30057249803499375, 
             'm_ncdm': 0.06, 
             'tau_reio': 0.3055598544108618, 
             'N_ncdm': 1, 
             'N_ur': 2.0328, 
             'output': 'tCl'}

cosmo = Class()
cosmo.set(paramdict)
cosmo.compute(['background'])

I've chased it down to line 1355 in tools/evolver_ndf15.c:

  1339    else{
  1340      /*Normal case:*/
  1341      for(j=1;j<=neq;j++){
  1342        Fdiff_new = 0.0;
  1343        Fdiff_absrm = 0.0;
  1344        for(i=1;i<=neq;i++){
  1345          Fdiff_absrm = MAX(fabs(Fdiff_new),Fdiff_absrm);
  1346          Fdiff_new = nj_ws->ydel_Fdel[i][j] - fval[i];
  1347          dFdy[i][j] = Fdiff_new/nj_ws->del[j];
  1348          /*Find row maximums:*/
  1349          if(fabs(Fdiff_new)>=Fdiff_absrm){
  1350            /* Found new max location in column */
  1351            nj_ws->Rowmax[j] = i;
  1352            nj_ws->Difmax[j] = fabs(Fdiff_new);
  1353          }
  1354        }
  1355        nj_ws->absFdelRm[j] = fabs(nj_ws->ydel_Fdel[nj_ws->Rowmax[j]][j]);
  1356      }
  1357    }

nj_ws->Rowmax[j] returns some random number, so I guess j overshoots the Rowmax array somehow. I'm a bit lost with all these indices. Can anyone else reproduce this? Anybody an idea what might be going wrong here?

Versions:

Python 3.8.2
classy v3.0.1
pstoecker commented 3 years ago

Hi @lukashergt ,

this is interesting. I have seen a similar issue in the context of energy injection with class v3.0.0 and I know exactly what goes wrong here. For your parameter combination, some, if not all, entries of the DE system that will be evolved by ndf15 contain a NaN such that the calculation that determines the value of nj_ws->Rowmax[j] fails and it leaves nj_ws->Rowmax[j] uninitialised such that you go out of bounds of the nj_ws->ydel_Fdel array and you access random data.

This can be fixed by explicitly checking the derivatives and throwing an error if they are ill-defined.

Best, Patrick

fruzsinaagocs commented 3 years ago

Hi @pstoecker and @lukashergt,

I tried reproducing this issue to see if my fix over at https://github.com/GambitBSM/gambit/issues/278 worked here. I couldn't reproduce the issue with either of these:

In a discussion with Lukas, we pinned down the following:

I'm going to look into

But this discussion makes me more certain that the patch will not rule out valid points in parameter space - we definitely want to avoid that happening.

pstoecker commented 3 years ago

OK, I see.

I think there are two issues to fix here. The first thing is that ndf15 has to be fixed such that it throws an error whenever the derivatives are ill-defined. This should be independent of the question whether the parameter is actually fine and just due to some weird (system-dependent) computational issue the problem arises. Therefore it would be good if we could try your fix from GambitBSM/gambit#278 here

The other thing is the particular issue that is happening here. Also on my system (ubuntu 18.04, gcc 7.5) this particular parameter point does not crash. I haven't checked the results though but I guess that I will see a similar doubtful value of H. Given your explanation, I agree that this needs some investigation.

fruzsinaagocs commented 3 years ago

Ah, I should've been more clear - we tried the fix from https://github.com/GambitBSM/gambit/issues/278 and it successfully avoids the seg fault, so CLASS fails in a more elegant way and doesn't disrupt the MCMC chains.

lukashergt commented 3 years ago

Hi @fruzsinaagocs and @pstoecker, thanks for both your inputs that was very helpful. The ndf15-fix definitely proved useful in some cases.

I found another issue, though, that relates to this particular case and to curvature runs in general. The following lines in source/background.c have been commented out:

  /** - control that cosmological parameter values make sense, otherwise inform user */

  /* H0 in Mpc^{-1} */
  /* Many users asked for this test to be supressed. It is commented out. */
  /*class_test((pba->H0 < _H0_SMALL_)||(pba->H0 > _H0_BIG_),
             pba->error_message,
             "H0=%g out of bounds (%g<H0<%g) \n",pba->H0,_H0_SMALL_,_H0_BIG_);*/

This test is important to filter out cases such as collapsing universes. I don't know why "many users asked for this test to be suppressed", but there are definitely situations where these are relevant. Maybe an approach similar to CAMB would be useful, where you are allowed to set a viable H0 range, maybe with the additional option of leaving it unspecified?