radiocosmology / draco

A pipeline for the analysis and simulation of drift scan radio data
MIT License
9 stars 7 forks source link

Generate a Gaussian noise dataset #52

Closed jrs65 closed 4 years ago

jrs65 commented 5 years ago

This task should:

@tristpinsm feel free to add items to this one.

anjakefala commented 5 years ago

Edited based on Tristan's comment My understanding:

Is the above correct?

tristpinsm commented 5 years ago

that's right, except that vis_weight is actually the inverse variance

anjakefala commented 5 years ago

@tristpinsm Thanks!

anjakefala commented 5 years ago

Notes from meeting:

For a random generator, the following key things are needed:

anjakefala commented 5 years ago

Do we want to update the other tasks to use RandomGen instead of NumPy.random?

tristpinsm commented 5 years ago

Would the reason to do so be performance? Do we think using numpy is currently a limitation?

anjakefala commented 5 years ago

I will look into the difference between them! I just noticed that @jrs65 recommended using RandomGen and elsewhere in synthesis/noise.py uses NumPy.random.

tristpinsm commented 5 years ago

yeah I had never heard of RandomGen until now and I'm still not sure what its advantages are. From what I read in their docs it is supposed to be quite a bit faster in certain situations. They also note that compatibility (presumably between versions) is not guaranteed.

tristpinsm commented 5 years ago

I think this has been integrated in numpy already: https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator. As long as we are running a recent version this functionality should already be available (?)

jrs65 commented 5 years ago

I don't think there's any reason to go back and change things, but as RandomGen is already a requirement we should use it when we can.

As @tristpinsm says the advantage is performance, and there are certain tasks (of which this will be one) where the speed of the RNG is the bottleneck. I introduced it for the delay power spectrum estimator (which in some sense internally does what your doing here hundreds of times), and it took it down from 40 mins per power spectrum to more like 10 mins.

jrs65 commented 5 years ago

I agree that it looks like it has been merged into numpy in 1.17 (or at least the core). I'd like to check that it still has the OpenMP parallel mode though before dropping the dependency.

anjakefala commented 5 years ago

Understood!

The reason behind changing the original ones was to simplify the seed state setting (to avoid having one for setting the NumPy seed and one for setting the RandomGen seed). But if it is integrated into NumPy (I will check if it still has the OpenMP parallel mode), perhaps they share a state.

anjakefala commented 5 years ago

So, seeding seems to work in different ways for the "legacy random" and the "new generators".

RandomState provides access to legacy random https://numpy.org/devdocs/reference/random/legacy.html. get_state/set_state/seed specifically work with the legacy randoms https://numpy.org/devdocs/reference/random/legacy.html?highlight=seed.

The new RandomGenerator works by initialising a generator with a seed https://numpy.org/devdocs/reference/random/generator.html#numpy.random.Generator. SeedSequence https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence is the main class that determines the sequence of seeds.

So if we bump to NumPy 1.17, it will be a bit of a refactor, and the two random generators do not intersect with their seed states.

anjakefala commented 5 years ago

I cannot confirm whether it still has OpenMP parallel mode, but I do not see anything in here that would indicate support being removed: https://github.com/numpy/numpy/pull/13163.

tristpinsm commented 5 years ago

It seems like the changes are pretty significant between 1.16 and 1.17. Should we create an issue to migrate whatever uses the old RandomState methods to Generators?

anjakefala commented 5 years ago

@tristpinsm Agreed, they are big and out of scope for this issue/pr, and I think it would be good to do the migration.