noislabs / nois

The Nois standard library
10 stars 4 forks source link

pick: Implement a more optimised algo for large sets to prevent copying the data #41

Open kaisbaccour opened 1 year ago

kaisbaccour commented 1 year ago

https://dev.to/babak/an-algorithm-for-picking-random-numbers-in-a-range-without-repetition-4cp6

kaisbaccour commented 1 year ago

https://github.com/noislabs/nois/tree/draft/41

Running against the test where n = data.len() and data = vec![0,1,2,3,4,5] we get duplicate picks The algo is not rigourously described and the way it is coded does not guarantee unicity of the picks


------------
pointer_index: 5
random_index: 0
swap_map: {}
chosen_index: 0
Picked Values: [0]
Insert - swap_map: {0: 5}
Remove - swap_map: {0: 5}
------------
pointer_index: 4
random_index: 1
swap_map: {0: 5}
chosen_index: 1
Picked Values: [0, 1]
Insert - swap_map: {1: 4, 0: 5}
Remove - swap_map: {1: 4, 0: 5}
------------
pointer_index: 3
random_index: 1
swap_map: {1: 4, 0: 5}
chosen_index: 4
Picked Values: [0, 1, 4]
Insert - swap_map: {1: 4, 0: 5}
Remove - swap_map: {1: 4, 0: 5}
------------
pointer_index: 2
random_index: 0
swap_map: {1: 4, 0: 5}
chosen_index: 5
Picked Values: [0, 1, 4, 5]
Insert - swap_map: {1: 4, 0: 5}
Remove - swap_map: {1: 4, 0: 5}
------------
pointer_index: 1
random_index: 1
swap_map: {1: 4, 0: 5}
chosen_index: 4
Picked Values: [0, 1, 4, 5, 4]
Insert - swap_map: {1: 4, 0: 5}
Remove - swap_map: {0: 5}
------------
pointer_index: 0
random_index: 0
swap_map: {0: 5}
chosen_index: 5
Picked Values: [0, 1, 4, 5, 4, 5]
Insert - swap_map: {0: 5}
Remove - swap_map: {}
webmaster128 commented 1 year ago

Here we have a solution:

The high level idea is to create a permutation of M values from 0 to M-1 which is lazily evaluated and then pick the first N elements of it.

Filecoin does the same here: https://github.com/filecoin-project/rust-fil-proofs/blob/a216a9754e8ae48f61f93b7a5611c84418d251f1/storage-proofs-core/src/crypto/feistel.rs