Open daniel-ferguson opened 6 years ago
Sorry I didn't see this earlier!
I think you are quite right, both in that this is a bug and in terms of what causes it.
FWIW, the synchronized mode is primarily for unit tests: I don't think you would ever want to use it for real model fitting, the performance penalty is simply too large.
Hi, I ran into an issue around multithreading support when running goodbooks-recommender (https://maciejkula.github.io/2018/07/27/recommending-books-with-rust/)
When instantiating hyperparameters with
num_threads
greater than the number of cores on my CPU - the task never completes, and goodbooks-recommender's CPU usage drops to 0 almost immediately.After some
println!()
driven debugging I narrowed down the issue to this section ofsbr
https://github.com/maciejkula/sbr-rs/blob/master/src/models/sequence_model.rs#L101-L168I think the issue is caused by a combination of the following factors:
num_threads
is set higher than the number of logical cores, they will not all be synchronized, as rayon will only run n (number of cores) pieces of work at a timeInserting a
println!()
before and after this line in wyrm (https://github.com/maciejkula/wyrm/blob/12715ae99ca531db6557dca786e4a480ec608101/src/optim/mod.rs#L81) illustrates this issue.With thread count set to 4 all is fine and I see repeated "pre barrier sync", "post barrier sync" messages. If I set thread count to 5 I see 4 "pre barrier sync" messages followed by nothing, and the program hangs.
I'm not sure how to solve this, but hopefully this report is helpful nonetheless.