litebird / litebird_sim

Simulation tools for LiteBIRD
GNU General Public License v3.0
18 stars 13 forks source link

Issue in the Conjugate Gradient algorithm (destriper) #322

Open nraffuzz opened 2 months ago

nraffuzz commented 2 months ago

Using litebird_sim internal destriper, I noticed that the CG algorithm is behaving weirdly when playing with values of the threshold (i.e. the minimum discrepancy between the estimated baselines and the baselines deduced by the destriped map) lower than the default of 1e-7.

threshold = 1e-8 # default is 1e-7
destriper_params = lbs.DestriperParameters(
    output_coordinate_system=lbs.coordinates.CoordinateSystem.Galactic,
    samples_per_baseline=samples_per_baseline,
    iter_max=100,
    threshold=threshold,
    use_preconditioner=True,
)

Simulation features:

I tested CMB + noise, serially and using various MPI tasks, with different noise realizations (different seeds), and threshold values. I observed:

However the issue is "contained" in the sense that the baselines are updated only for lower values of the residuals, therefore baselines are updated up to the 5th iteration but never later. Here in destriper.py:

cur_stopping_factor = _get_stopping_factor(new_r)
if (not best_stopping_factor) or cur_stopping_factor < best_stopping_factor:
    best_stopping_factor = cur_stopping_factor
    for cur_best_x_k, x_k in zip(best_x, x):
        cur_best_x_k[:] = x_k

This means that setting the threshold value below the default of 1e-7 is producing the exact same result as if using threshold=1e-7, but possibly taking more time (depending on the number of iterations set, iter_max=100).

mreineck commented 2 months ago

Just a guess, but if the TOD in question are stored as single precision, you probably cannot expect anything better than a tolerance around 1e-7. If the data are indeed single precision, I'd suggest switching to double precision for a comparison and see if the tolerance can be lowered.

I also saw that the CG code contains a few instances of np.dot, which (at least in the past) was infamous for its bad accuracy, since it used (for single precision inputs) a single-precision variable for accmulating the results. Not sure ehether this has been improved in the meantime.

nraffuzz commented 2 months ago

I forgot to mention that all the tests above have been done with both pointings and TOD in double precision.. could it be related to the np.dot? I'll look into it Thanks

mreineck commented 2 months ago

Sorry, I was distracted by something else yesterday and forgot to answer...

If the input arrays are double precision, np.dot should do a good job, I think. So I don't really know where the problem arises :-/