stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.56k stars 366 forks source link

Pathfinder run not reproducible from seed #3237

Closed WardBrian closed 8 months ago

WardBrian commented 9 months ago

Summary:

Multi-path pathfinder returns a different output from the same seed when run with threading enabled.

Reproducible Steps:

Using cmdstan:

./bernoulli pathfinder data file=bernoulli.data.json random seed=124232 num_threads=6 output file=out1.csv
./bernoulli pathfinder data file=bernoulli.data.json random seed=124232 num_threads=6 output file=out2.csv
diff out1.csv out2.csv

Current Output:

The diff command will show that the files differ in draws, not just in the comment metadata

Expected Output:

I would expect a fixed seed to reproduce the same output

Additional Information:

I believe this is caused by these lines:

https://github.com/stan-dev/stan/blob/b6a309e2345d1eb86106e49774289d52e005a4c0/src/stan/services/pathfinder/multi.hpp#L128-L131

Because we are emplace_backing on a concurrent_vector, the order of items in this vector will be different each time depending on thread scheduling.

Changing it so that the vector is assigned to by the element at [iter] seems to resolve the issue.

Current Version:

v2.33.0