mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
225 stars 55 forks source link

Use standard NumPy random number generators in metric spaces to limit RAM usage #179

Open rhugonnet opened 6 months ago

rhugonnet commented 6 months ago

Right now the standard random number generators of NumPy: rng = np.random.default_rng(seed=) do not work when passed to ProbabilisticMetricSpace or RasterMetricSpace. Only the legacy ones do (equivalent of np.random.seed() now defined as np.random.RandomState), but they are probably not that useful in our case (we don't need to exactly reproduce random sampling from old scripts). And, the legacy versions leak a lot of memory when using a random choice without replacement, which is exactly what we use: https://github.com/numpy/numpy/issues/14169.

So for instance, if we only want to use 10,000 samples from 1 billion for the variogram estimation, the legacy version will still create an array of 1 billion points in the background using tons of RAM :sweat_smile:.

Will try to fix this at the same time as #178!