Closed jrs65 closed 4 years ago
Edited based on Tristan's comment My understanding:
weight
property is the estimate of the variance of the noise in each datasample.vis_weight
is the inverse varianceweight
variance Is the above correct?
that's right, except that vis_weight
is actually the inverse variance
@tristpinsm Thanks!
Notes from meeting:
For a random generator, the following key things are needed:
you want it to be reproducable
you want it to be generating different arrays for each node and for each realisation
[ ] for mpi_random_seed
, the extra
param takes care of generating a different array for each node. we also may want to have multiple reproducible realisations. Proposal is to either add a counter to tasks that employ it or the mpi_random_seed
context manager keeps track of how often it was run.
Do we want to update the other tasks to use RandomGen
instead of NumPy.random
?
Would the reason to do so be performance? Do we think using numpy
is currently a limitation?
I will look into the difference between them! I just noticed that @jrs65 recommended using RandomGen
and elsewhere in synthesis/noise.py
uses NumPy.random
.
yeah I had never heard of RandomGen
until now and I'm still not sure what its advantages are. From what I read in their docs it is supposed to be quite a bit faster in certain situations. They also note that compatibility (presumably between versions) is not guaranteed.
I think this has been integrated in numpy
already: https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator. As long as we are running a recent version this functionality should already be available (?)
I don't think there's any reason to go back and change things, but as RandomGen
is already a requirement we should use it when we can.
As @tristpinsm says the advantage is performance, and there are certain tasks (of which this will be one) where the speed of the RNG is the bottleneck. I introduced it for the delay power spectrum estimator (which in some sense internally does what your doing here hundreds of times), and it took it down from 40 mins per power spectrum to more like 10 mins.
I agree that it looks like it has been merged into numpy in 1.17 (or at least the core). I'd like to check that it still has the OpenMP parallel mode though before dropping the dependency.
Understood!
The reason behind changing the original ones was to simplify the seed state setting (to avoid having one for setting the NumPy seed and one for setting the RandomGen seed). But if it is integrated into NumPy
(I will check if it still has the OpenMP parallel mode), perhaps they share a state
.
So, seeding seems to work in different ways for the "legacy random" and the "new generators".
RandomState
provides access to legacy random https://numpy.org/devdocs/reference/random/legacy.html. get_state/set_state/seed specifically work with the legacy randoms https://numpy.org/devdocs/reference/random/legacy.html?highlight=seed.
The new RandomGenerator
works by initialising a generator with a seed https://numpy.org/devdocs/reference/random/generator.html#numpy.random.Generator. SeedSequence
https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence is the main class that determines the sequence of seeds.
So if we bump to NumPy 1.17, it will be a bit of a refactor, and the two random generators do not intersect with their seed states.
I cannot confirm whether it still has OpenMP parallel mode, but I do not see anything in here that would indicate support being removed: https://github.com/numpy/numpy/pull/13163.
It seems like the changes are pretty significant between 1.16 and 1.17. Should we create an issue to migrate whatever uses the old RandomState
methods to Generator
s?
@tristpinsm Agreed, they are big and out of scope for this issue/pr, and I think it would be good to do the migration.
This task should:
weight
property).RandomGen
which is already a dependency)@tristpinsm feel free to add items to this one.