lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
230 stars 285 forks source link

Infinite loop in thermodynamics_reionization_sample due to double cancellation #338

Open Stefan-Heimersheim opened 4 years ago

Stefan-Heimersheim commented 4 years ago

Summary: When one calls CLASS using reio_interp with some points [z1, z2, z3], [xe1, xe2, xe3] the code can get stuck when interpolating, e.g. when xe2=xe3=0. The check that should prevent that is not strict enough.

Details: thermodynamics_reionization_sample samples the ionization history at a close-enough z point so the relevant quantities don't change too much. This happens in a while (z > 0.) loop trying to find a point z_next, closer and closer to z (z_next=z-dz;). Because of another bug* it always tries to decrease dz but at some point dz is so small that z_next = z. Then the algorithm thinks the step worked while if fact it didn't step at all and gets stuck in a loop at this redshift. The check dz < ppr->smallest_allowed_variation (the latter defaults to machine precision) is supposed to catch this but does not because even if dz > ppr->smallest_allowed_variation, z_next = z can occur (since z >> 1, e.g. 50). This could be fixed by either just making this check stricter by a factor (100 or so) or by doing the relative comparison (dz/z < machine precision).

Proposed fix: Change this line in thermodynamics.c, around line 2880

class_test(dz < ppr->smallest_allowed_variation,

to

class_test(dz < z*ppr->smallest_allowed_variation,

or maybe a more strict criterion.

Minimal example: This in .ini file triggers the bug:

reio_parametrization = reio_inter
reio_inter_num = 3
reio_inter_z = 0,10,50
reio_inter_xe = 1,0,0

Note that changing z2 to e.g. 1 avoids the bug as 1-dz=1 requires dz<machine precision (which is caught), but 10-dz=10 already happens before.

* other bug: The reason the sampler get suck when x2=x3=0 is related to it comparing the relative variation here, which probably stays finite because of some division by 0. I didn't investigate this in detail though.

relative_variation = fabs((dkappadz_next-dkappadz)/dkappadz) +
  fabs((dkappadtau_next-dkappadtau)/dkappadtau);