haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

Sporadic ArrayIndexOutOfBoundsExceptions when calling MathEx.setSeed() from multiple threads #660

Closed adippold closed 3 years ago

adippold commented 3 years ago

I have a data processing pipeline implemented in Java, which uses separate Java threads for processing separate input files. It is typical for the pipeline to execute 30-40 parallel threads at a time. In each thread, I am executing a sequence of data analysis and transformation steps, some of which use components of the Smile library (Various types of Clustering, SVM). To make comparison of different runs possible, I am resetting the random seed before starting each of the processing steps by calling MathEx.setSeed( 42 ). So far this approach was working fine, but after upgrading Smile from 1.5.2 to 2.6.0, I am sporadically getting exceptions when calling setSeed:

java.lang.ArrayIndexOutOfBoundsException: Index 624 out of bounds for length 624
        at smile.math.random.MersenneTwister.setSeed(MersenneTwister.java:93)
        at smile.math.random.MersenneTwister.setSeed(MersenneTwister.java:87)
        at smile.math.Random.setSeed(Random.java:56)
        at smile.math.MathEx.setSeed(MathEx.java:499)

Looking at the sources I see that the immediate problem is that setSeed in MersenneTwister is not thread-safe: if multiple threads call the same method, value of 'mti' becomes not well defined. I also spotted uses of MathEx.seeds static variable in the static setSeed() method which is not thread-safe and could cause unexpected behaviour.

I am considering these issues to be 'bugs' but it may be that Smile does not currently support separate random initialization contexts, or that I am not using the API in the intended way. Please advise.

haifengl commented 3 years ago

I make some changes. Please try the master branch.

adippold commented 3 years ago

Thank you for the prompt fix. All seems to be working fine: I'm not getting ArrayIndexOutOfBoundsExceptions anymore and my results are reproducible across runs.

Your help is hugely appreciated.