scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
282 stars 83 forks source link

Update all uses of np.random.seed to use np.random.default_rng #1795

Open matthewfeickert opened 2 years ago

matthewfeickert commented 2 years ago

While making these examples using generated random data so I could make these examples public I learned that

from numpy import random
random.seed(0)

is considered bad and we should instead be doing things like

from numpy import random
from numpy.random import PCG64, SeedSequence
rng = random.default_rng(PCG64(SeedSequence(0)))  # Generator(PCG64) at 0x7F00A8E519E0
rng.random()  # 0.6369616873214543

Seems this is related to NEP 19 and there's a blog post more on this (Stop using numpy.random.seed()).

Originally posted by @matthewfeickert in https://github.com/scikit-hep/mplhep/issues/362#issuecomment-1044853242

matthewfeickert commented 2 years ago

At 9fd99be886349a90e927672e950cc233fad0916c there are the following to get replaced

$ git grep "seed(0)"
docs/examples/notebooks/learn/TestStatistics.ipynb:    "np.random.seed(0)\n",
docs/examples/notebooks/learn/UsingCalculators.ipynb:    "np.random.seed(0)"
docs/examples/notebooks/toys.ipynb:    "np.random.seed(0)\n",
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/calculators.py:            >>> random.seed(0)
src/pyhf/infer/utils.py:        >>> random.seed(0)
src/pyhf/probability.py:        >>> random.seed(0)
src/pyhf/probability.py:            >>> random.seed(0)
src/pyhf/probability.py:        >>> random.seed(0)
tests/benchmarks/test_benchmark.py:    np.random.seed(0)  # Fix seed for reproducibility
tests/test_backend_consistency.py:    np.random.seed(0)  # Fix seed for reproducibility
tests/test_infer.py:    np.random.seed(0)
tests/test_infer.py:    np.random.seed(0)
tests/test_public_api.py:    np.random.seed(0)
tests/test_validation.py:    np.random.seed(0)
matthewfeickert commented 2 years ago

The

https://github.com/scikit-hep/pyhf/blob/47ae6ad565920cc34a853b8be6f268b04e1f900c/src/pyhf/infer/utils.py#L36

docstring example

https://github.com/scikit-hep/pyhf/blob/47ae6ad565920cc34a853b8be6f268b04e1f900c/src/pyhf/infer/utils.py#L46-L60

shows that to adopt this though we'd need to provide a way to pass in a rng object to anything toybased as (expectedly)

# issue_1795.py
from numpy import random
from numpy.random import PCG64, SeedSequence

import pyhf

rng = random.default_rng(PCG64(SeedSequence(0)))
model = pyhf.simplemodels.uncorrelated_background(
    signal=[12.0, 11.0],
    bkg=[50.0, 52.0],
    bkg_uncertainty=[3.0, 7.0],
)
observations = [51, 48]
data = observations + model.config.auxdata
mu_test = 1.0
toy_calculator = pyhf.infer.utils.create_calculator(
    "toybased", data, model, ntoys=100, test_stat="qtilde", track_progress=False
)
qmu_sig, qmu_bkg = toy_calculator.distributions(mu_test)
print(qmu_sig.pvalue(mu_test), qmu_bkg.pvalue(mu_test))

gives

$ python issue_1795.py
0.1 0.82
$ python issue_1795.py
0.13 0.77