kyonifer / koma

A scientific computing library for Kotlin. https://kyonifer.github.io/koma
Other
270 stars 23 forks source link

RNG should have thread-local seeding #87

Open kyonifer opened 5 years ago

kyonifer commented 5 years ago

Right now, all threads share the same global seed. This makes it difficult to reproduce results in multi-threaded code. If two threads are both grabbing random values at the same time, they will receive different segments of the rng stream depending on scheduling. A contrived example:

    var out1 = 0.0
    var out2 = 0.0
    val N = 100
    setSeed(1234)
    val t1 = thread {
        for (i in 0..N) {
            out1 += randn(2,2).elementSum()
        }
    }
    val t2 = thread {
        for (i in 0..N) {
            out2 += randn(2,2).elementSum()
        }
    }
    t1.join()
    t2.join()
    println(out1 > out2)

Because the two threads are racing for values from the RNG, they will grab different values from run to run, and the output will sometimes be true and sometimes false. Ideally, nothing about the above program would change and it would always produce the same result.

Probably the easiest way to fix this is to have the global setSeed call be used to generate thread-local seeds which will be used by thread-local rngs. On the JVM this could be done in a lock-free manner using AtomicInteger for change detection (the goal should be to prefer slowness when a seed changes since that is normally a rare event, and avoid synchronization during generation when possible), probably isn't necessary on js (which is single threaded), and requires #86 (and probably workers) on native.

peastman commented 5 years ago

Alternatively, it might be cleaner to use the same seed for every thread but give each one a different stream ID.

Even with thread local generators, your example code probably wouldn't be fully deterministic. Depending on which order the two threads started running in, their seeds/IDs might get swapped from one run to the next.

kyonifer commented 5 years ago

I was thinking the thread-local seed would be based on the thread name + the setSeed argument, which would make it fully deterministic if the user sets the name for each thread they use (at least on the JVM).