mratsim / weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Other
532 stars 22 forks source link

Use Lnear Congruential Generator to generate work-stealing permutation #193

Open mratsim opened 1 year ago

mratsim commented 1 year ago

A tricky issue in message-passing based work-requesting/work-stealing is how to efficiently select victims, especially with a large number of cores.

Instead we can use a Linear Congruential Generator: https://en.wikipedia.org/wiki/Linear_congruential_generator

The following recurrence generates all random numbers mod m without repetition:

Xₙ₊₁ = aXₙ+c (mod m)

if and only if (Hull-Dobell theorem):

  1. c and m are coprime
  2. a-1 is divisible by all prime factors of m
  3. a-1 is divisible by 4 if m is divisible by 4

The only state to send in steal request would be a and c. Xₙ is the victim ID and m is either the number of threads or for ease of implementation the next power of 2.

Nim code: https://github.com/mratsim/constantine/blob/1dfbb8bd4f2fefc4059f4aefb040f0b3ba2918cb/constantine/platforms/threadpool/threadpool.nim#L84-L128

iterator pseudoRandomPermutation(randomSeed: uint32, maxExclusive: int32): int32 =
  ## Create (low-quality) pseudo-random permutations for [0, max)
  # Design considerations and randomness constraint for work-stealing, see docs/random_permutations.md
  #
  # Linear Congruential Generator: https://en.wikipedia.org/wiki/Linear_congruential_generator
  #
  # Xₙ₊₁ = aXₙ+c (mod m) generates all random number mod m without repetition
  # if and only if (Hull-Dobell theorem):
  # 1. c and m are coprime
  # 2. a-1 is divisible by all prime factors of m
  # 3. a-1 is divisible by 4 if m is divisible by 4
  #
  # Alternative 1. By choosing a=1, all conditions are easy to reach.
  #
  # The randomness quality is not important besides distributing potential contention,
  # i.e. randomly trying thread i, then i+1, then i+n-1 (mod n) is good enough.
  #
  # Assuming 6 threads, co-primes are [1, 5], which means the following permutations
  # assuming we start with victim 0:
  # - [0, 1, 2, 3, 4, 5]
  # - [0, 5, 4, 3, 2, 1]
  # While we don't care much about randoness quality, it's a bit disappointing.
  #
  # Alternative 2. We can choose m to be the next power of 2, meaning all odd integers are co-primes,
  # consequently:
  # - we don't need a GCD to find the coprimes
  # - we don't need to cache coprimes, removing a cache-miss potential
  # - a != 1, so we now have a multiplicative factor, which makes output more "random looking".

  # n and (m-1) <=> n mod m, if m is a power of 2
  let maxExclusive = cast[uint32](maxExclusive)
  let M = maxExclusive.nextPowerOfTwo_vartime()
  let c = (randomSeed and ((M shr 1) - 1)) * 2 + 1 # c odd and c ∈ [0, M)
  let a = (randomSeed and ((M shr 2) - 1)) * 4 + 1 # a-1 divisible by 2 (all prime factors of m) and by 4 if m divisible by 4

  let mask = M-1                                   # for mod M
  let start = randomSeed and mask

  var x = start
  while true:
    if x < maxExclusive:
      yield cast[int32](x)
    x = (a*x + c) and mask                         # ax + c (mod M), with M power of 2
    if x == start:
      break
dumblob commented 1 year ago

You are genius! I use congruential generators for many different use cases but this use case I never thought of. So simple yet so efficient. Congratulations!