simonsobs / map_based_simulations

Map based simulations for the Simons Observatory
4 stars 1 forks source link

Implement noise splits. #30

Closed msyriac closed 4 years ago

msyriac commented 4 years ago

We should allow the mapsims API to accept a number of splits argument and provide independent noise realizations after scaling the input noise curves by the number of splits.

Some things to decide:

  1. default number of splits per tube (proposal, nsplits=4)
  2. whether we should save split noise sims to disk

For (2), it depends on how PS (@thibautlouis @xzackli @stevekchoi ), component separation (@jcolinhill @dpole ) and lensing (@ajvanengelen and me) plan to use the SO noise sims. For lensing, we plan to directly call the mapsims code and generate them on the fly, so we don't need them saved to disk.

aiolasimone commented 4 years ago

I think TAC (@jodunkley) may want to chime in on this, but in general assuming/requiring that map-based noise sims will only be generated on-the-fly will save us a lot of disk space. Random-generator seeds can be saved and shared between AWGs for reproducibility. As long as we save the seed and the version of the code used, I think we can re-generate the noise set no problem.

msyriac commented 4 years ago

I completely agree, and as you know this is the approach we've taken in ACT lensing. I think it's fine to save a few noise sims to disk for people who are just starting out and want to get an idea of what the "data" looks like, but saving more than a few is unfeasible.

stevekchoi commented 4 years ago

Current ACT power spectrum analysis saves these sims to disk because of all the cross correlations that need to be computed. For SO, if we have maps separated into multiple arrays and years and would like to keep them separated, then one will need to generate the same year/array simulation maps on the fly multiple times (e.g. for doing map_yr_i_array_j x map_yr_k_array_l, for i == k, i != k, etc.), and hence it should be very fast to generate these half-arcmin resolution maps for LAT. (Planning ahead to generate all of them on the fly at once doesn't seem feasible.) So either super fast sims or we work with combined maps for sims, which are OK only in the cosmic variance limited regime.

On Wed, Dec 11, 2019 at 11:54 AM Mathew S. Madhavacheril < notifications@github.com> wrote:

I completely agree, and as you know this is the approach we've taken in ACT. I think it's fine to save a few noise sims to disk for people who are just starting out and want to get an idea of what the "data" looks like, but saving more than a few is unfeasible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonsobs/map_based_simulations/issues/30?email_source=notifications&email_token=AF7NCQANO6PY734EMEZV4UDQYELM3A5CNFSM4JZRYR52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGT2ORA#issuecomment-564635460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7NCQA4IPAMP3P7M526NTTQYELM3ANCNFSM4JZRYR5Q .

aiolasimone commented 4 years ago

I think that the problem could be easily solved with MPI communication. You really need to generate the simulation once and broadcasted it to the nodes that use that particular single-array, single-year noise map. I understand that this requires help to integrate it, as not everyone is familiar with HPC.

stevekchoi commented 4 years ago

Yes, that sounds awesome. I'm far from it but perhaps Thibaut is working on something close to it.

On Wed, Dec 11, 2019 at 1:03 PM Simone Aiola notifications@github.com wrote:

I think that the problem could be easily solved with MPI communication. You really need to generate the simulation once and broadcasted it to the nodes that use that particular single-array, single-year noise map. I understand that this requires help to integrate it, as not everyone is familiar with HPC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonsobs/map_based_simulations/issues/30?email_source=notifications&email_token=AF7NCQDKQSGHKACCB6AGSTDQYETNJA5CNFSM4JZRYR52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGUBJUY#issuecomment-564663507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7NCQBAP4YGJ2X3DMKLA33QYETNJANCNFSM4JZRYR5Q .

zonca commented 4 years ago

I think that the problem could be easily solved with MPI communication.

"easily with MPI", nice oxymoron!

You really need to generate the simulation once and broadcasted it to the nodes that use that particular single-array, single-year noise map.

communicating high resolution maps over MPI is very time consuming, it might be quicker to just create them on the fly multiple times. But it is hard to say without benchmarking. I think we should get some more figures on number of maps / disk space / time to produce them. I will update the noise model to the new version and then run some benchmarks, once I have those numbers I'll ask people how many simulations we want to run (and what we can run on the fly).

zonca commented 4 years ago

@msyriac are the 4 splits independent from the full map? or should we somehow combine the 4 splits to get the full map?

msyriac commented 4 years ago

If we have k splits s_i each with individual inverse variance maps h_i, then the full map is: coadd = \sum s_i h_i / \sum h_i

In our case, the hit maps h_i are just scaled versions h_i = h/k of the original hit map h.

I don't think it is possible (or straightforward) to generate split simulations that are constrained such that they satisfy the above given some coadd map regardless of the number of splits requested. (Of course, that would be possible if we were generating simulations from TODs). What that means is that the "full map" (above) will be a different realization for every value of k used. I don't think this is a problem since k (baseline k=4) is something we don't have to change often. So once we decide e.g. k=4, the full map can be derived using the above given simulations of the splits.

zonca commented 4 years ago

about the disk space required to store the simulations with variable NSIDE (#18) and simulating separately the different tubes as the TOD simulations do. I get:

zonca commented 4 years ago

For now I only have some benchmarks of map2alm, alm2map, in the future I want to benchmark the full noise simulation pipeline: https://gist.github.com/zonca/22e83694306f225c80eb9c9a104a8167

zonca commented 4 years ago

also got simulations of the noise pipeline, see the last 2 columns at https://gist.github.com/zonca/22e83694306f225c80eb9c9a104a8167. I still need to add cross-correlation between the channels, but I think this is a reasonable estimate.

dpole commented 4 years ago

Thanks for starting this thread. I agree that it is unfeasible to store MCs of simulations with many splits. For FG studies, I don't think we'll need many in the short term. My preference would be to have split maps for a handful of seeds on disk. I think that 4 for both splits and seeds is a good starting point. I'd discuss larger numbers if we realize we really need more simulations/splits, but in that case I agree that interfacing directly with the API is a better solution.

zonca commented 4 years ago

From today's TOD2Maps call: