Closed sgiardie closed 10 months ago
OK ... the huge jumps in phi and polarization angle are wraparounds between 0 and 2pi, so the actual jitter will most likely be much, much smaller. So the errors in theta are probably the best indication of how much the pointings jitter.
The interpretation that this is caused by inaccuracies when computing times sounds reasonable. I don't know how times are added, subtracted etc. in the code, but it's quite possible that there are some conversions going on inside astropy
which degrade the accuracy of the results.
Could you please show a plot of the colatitude error when using double precision pointings? That might give an indication how bad the problem really is.
It might also be helpful to look at the statistics of the deviations: e.g. which fraction of the pointings is actually different between serial and parallel cases. Despite the quite busy plots above, it could still be a very small fraction.
Hi. I confirm what @sgiardie is saying and I put here my code that I run with 1 (serial), 2, 4 and 8 mpi tasks:
import litebird_sim as lbs
import numpy as np
import healpy as hp
import matplotlib.pylab as plt
from astropy.time import Time
import time
import pathlib
import scipy
import pickle
import os
import sys
#for parallelization
comm = lbs.MPI_COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
#initializing the IMO
imo = lbs.Imo()
#initializing the simulation
sim = lbs.Simulation(
base_path='/marconi_scratch/userexternal/mbortola/litebird/test_lbs/test_pointing/',
start_time=0.,#Time("2029-01-01T00:00:00"),
duration_s=1e6,
mpi_comm=comm,
random_seed=12345,
)
imo_version = 'v1.3'
telescope = 'LFT'
base_path = sim.base_path
start_time = sim.start_time
duration_s = sim.duration_s
channels = ['L2-050']
detnames = ['000_000_000_QA_050_T']
ndet = np.size(detnames)
comm.barrier()
#loading the instrument metadata
inst_info = sim.imo.query("/releases/"+imo_version+"/satellite/"+telescope+"/instrument_info")
#generating the quaternions of the instrument
sim.generate_spin2ecl_quaternions(
imo_url="/releases/"+imo_version+"/satellite/scanning_parameters/",
delta_time_s=60.,
)
#loading instrument info
inst = lbs.InstrumentInfo(
name=telescope,
boresight_rotangle_rad=np.deg2rad(inst_info.metadata["boresight_rotangle_deg"]),
spin_boresight_angle_rad=np.deg2rad(inst_info.metadata["spin_boresight_angle_deg"]),
spin_rotangle_rad=np.deg2rad(inst_info.metadata["spin_rotangle_deg"]),
)
#loading channel info #marco : should ch_info have the same lenght as dets?
ch_info = lbs.FreqChannelInfo.from_imo(url="/releases/"+imo_version+"/satellite/"+telescope+"/"+channels[0]+"/channel_info",
imo=imo)
comm.barrier()
dets = []
for i_det in range(ndet):
#get info from IMO
det = lbs.DetectorInfo.from_imo(
url="/releases/"+imo_version+"/satellite/"+telescope+"/"+channels[i_det]+"/"+detnames[i_det]+"/detector_info",
imo=imo,
)
dets.append(det)
comm.barrier()
#create Observation object
(obs,) = sim.create_observations(
detectors=dets,
n_blocks_det=1,
n_blocks_time=size,
split_list_over_processes=False,
)
#hwp specification
hwp_radpsec = inst_info.metadata["hwp_rpm"]*2*np.pi/60
comm.barrier()
#get pointings and store them in obs
quaternion_buffer = np.zeros((obs.n_samples, 1, 4))
pointings = lbs.pointings.get_pointings(
obs,
spin2ecliptic_quats=sim.spin2ecliptic_quats,
bore2spin_quat=inst.bore2spin_quat,
hwp=lbs.IdealHWP(hwp_radpsec), #applies hwp rotation angle to the polarization angle
quaternion_buffer=quaternion_buffer,
store_pointings_in_obs=False, #if True, stores colatitude and longitude in obs.pointings,
#and the polarization angle in obs.psi
dtype_pointing=np.float64,
)
del quaternion_buffer
np.save(
base_path.absolute().as_posix()+'/pointings_size'+str(size)+'_rank'+str(rank)+'.npy',
pointings,
)
Below some plots for the colatitude $\theta$ with pointings in double precision and without astropy.
Plot of the colatitude for different tasks (blue 1, orange 2, green 4, red 8) vs number of sample. You can see only red as they agree
Plot of first half of samples with 1 task minus first half of samples with 2 tasks (all zero)
Plot of second half of samples with 1 task minus second half of samples with 2 tasks (now you see the difference)
Plot of first 1/8 of samples with 1 task minus first 1/8 of samples with 8 tasks (all zero)
Plot of second 1/8 of samples with 1 task minus second 1/8 of samples with 8 tasks (now you see the difference)
Plot of last 1/8 of samples with 1 task minus last 1/8 of samples with 8 tasks (the difference is larger)
With astropy instead and again in double precision (for the first bunch of data the difference is always zero).
Plot of second half of samples with 1 task minus second half of samples with 2 tasks
Plot of second 1/8 of samples with 1 task minus second 1/8 of samples with 8 tasks
Plot of last 1/8 of samples with 1 task minus last 1/8 of samples with 8 tasks (bigger difference)
It might also be helpful to look at the statistics of the deviations: e.g. which fraction of the pointings is actually different between serial and parallel cases. Despite the quite busy plots above, it could still be a very small fraction.
This is the plot (colatitude(parallel) - colatitude(serial)) / colatitude(serial) in single precision
Basically the same plot for double precision
The same plot as in my first comment, just in double precision, shows a smaller absolute difference (very crowded plot again)
Judging from the plots, this looks very much like an issue with dynamic range of double precision numbers...
Let's assume double precision pointings for the moment,and let's also assume that time is stored in seconds.
If time starts at 0, and mission duration is 1e6 seconds, then the smallest time interval that can be represented by a double precision number is about 1e-10 seconds near the end of the simulated time span. So absolute times may vary (due to numerical precision) by about that much if they are computed in different (but mathematically equivalent) ways, That makes pointing errors around 1e-12 look reasonable.
Now, astropy
probably stores times as something like "seconds since Jan 1, 1970", so the simulation time starts at something in the 1e9 range. In this case the smallest representable time interval is around 1e-5 seconds, pushing up the pointing errors to 1e-7.
So the errors come from the dilemma of
I think we understood where the difference was coming from, thanks @mreineck. Should we close this or more discussions / tests are needed?
It depends on whether the accuracy you get for the actual mission times is "good enough". If not, perhaps we can convince astropy
to use something like np.float128
for times internally. But I'm not sure if that can be done.
I computed a map for 3 yrs of observation, nside = 256 (the maps in the first comment have nside=64), one in serial mode and one using 2 tasks. I made the difference of the maps (in muK) and it is pretty similar to the difference shown in the maps at the beginning (they were computed for 1 yr of observation and are shown in K). I guess a difference of 1e-5 muK on the galactic plane is not a huge concern if we are going to mask anyway. With a 70% mask it seems to go down to 1e-6 muK What do you think?
What do you think?
Thanks all for the detailed analysis and the nice plots! My opinion is that pointing inaccuracies of 10⁻⁶ should not be too worrisome, as the actual instrument will have far larger pointing errors…
Of course, these differences might bite us back once we compare results of different simulations made with different MPI distributions, but unless we find a trivial fix for this, I am not sure it's worth spending too much time to find a solution.
Understood. How do we proceed? We close this Issue as "not planned" or "completed"?
Mmm, I believe the best tag for this would be “WONTFIX” if it's available: we acknowledge that the discrepancy is real, but we accept to live with it.
I was testing if there was any difference in running a code using litebird_sim serially or in a parallel way. Using exactly the same code (which was producing a map for 1 year of observation), I noticed a difference in the maps produced at the end. This is the difference between the temperature map in Kelvin produced with the serial run and the same map produced with a parallel run with 8 ranks (on the left) and 4 ranks (on the right). You can notice a ~10^(-12) difference especially on the galactic plane, which corresponds to a ~10^(-6) difference in muK. The difference increases when increasing the number of ranks.
I tried to see if this difference in the maps was due to a difference in the pointings produced in the parallel and serial case. I run a more simple code with a lower observation time (10 days) in the serial case and in the parallel one (with just two ranks) and compared the corresponding pointings. These are the differences between the colatitude, longitude and polarization angles (the three pointing fields) for the same detector:
You can see that the result of the rank 0 is always the same as the serial result for the corresponding time chunk, while there are differences for the following rank.
I asked @marcobortolami to test this too. We noticed that the difference in the pointing is bigger if one initializes the simulation using
start_time=astropy.time.Time(date)
instead ofstart_time=0
(I was usingastropy.time
in my tests). We also saw that the difference decreases when computing the pointing with double precision instead of single precision (the default), and that the difference in the pointing increases for the ranks corresponding to successive time chunks. @marcobortolami can put an example code and plots showing this.We guess this is due to an (inevitable?) error in the start time of each time chunk. Is there a way to reduce this error?