sahilm89 / lhsmdu

This is an implementation of Deutsch and Deutsch, "Latin hypercube sampling with multidimensional uniformity", Journal of Statistical Planning and Inference 142 (2012) , 763-772
MIT License
79 stars 16 forks source link

Resampling of realization matrix is incorrect #13

Open wim-cpp opened 1 year ago

wim-cpp commented 1 year ago

https://github.com/sahilm89/lhsmdu/blob/925efa88e1dd87d02b6dbe0d27c64b53e8dd98e5/lhsmdu/__init__.py#L81

Resampling should be done by determining (per dimension) the index of a realization after sorting all realizations in that dimension. However, this is not what argsort does. For example [0.62, 0.19, 0.92, 0.22, 0.07] should result in [3, 1, 4, 2, 0], but argsort gives [4, 1, 3, 0, 2] instead. Apply argsort again on sortedIndicesOfStrata to get the correct vector of indices. In this example argsort([4, 1, 3, 0, 2]]) gives the correct result [3, 1, 4, 2, 0].

The effect of this incorrect resampling is that the resulting Latin hypercube samples are far from uniformly distributed, as intended by LHSMDU.

sahilm89 commented 1 year ago

I'm not sure if I follow the issue, could you elaborate? argsort gives us the sorted indices of the realizations for each dimension. The benchmarks don't show samples being far from uniformly distributed. Could you show me an MWE of where the samples are not uniform?

wim-c commented 1 year ago

The paper by Deutsch & Deutsch is not very explicit and just states “realizations are ranked”. See my example above of how a list of numbers should be ranked, or see this wikipedia entry about ranking. Just generate (e.g. two-dimensional) examples using both argsort and proper ranking and you’ll see te difference.