Open 0xalexbel opened 7 months ago
Hello, thanks for the detailled report, we are going to investigate and see what we can do about it
Sadly for now I don't have any workaround other than not using par_iter when encrypting Compact ciphertexts
there is a way to fix this potentially with a Mutex instead of a ref cell, not sure it's gonna be great and not the source of deadlocks, so will have to test and make sure we understand what rayon does with tasks
but a Mutex essentially defeats the thread local storage so, not great
and it deadlocks of course, it's the well known rayon bug from here https://github.com/rayon-rs/rayon/issues/592
using this issue as a bit of a notepad on that issue
the problem arises when there are nested rayon calls IIRC, as the recent examples/addition proposal in the https://github.com/rayon-rs/rayon/issues/592 issue (e.g. https://github.com/rayon-rs/rayon/issues/592#issuecomment-2177270078) for fully blocking thread pool seems to indicate
in our case some threads are stealing some tasks from other threads where the engine has already been borrowed, I'm still unclear on the exact succession of events
could be
could be
example log
looks to be the first case 🤔
Thread #ThreadId(1), borrow cell: 0x7f40d3d6dc40
Thread #ThreadId(1), stops borrow cell
Thread #ThreadId(1), borrow cell: 0x7f40d3d6dc40
Thread #ThreadId(1), stops borrow cell
Thread #ThreadId(1), borrow cell: 0x7f40d3d6dc40
Thread #ThreadId(1), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(10), borrow cell: 0x7f40bbdfe1c0
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(6), borrow cell: 0x7f40d07041c0
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(10), stops borrow cell
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(5), borrow cell: 0x7f40d09081c0
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(2), borrow cell: 0x7f40d0f0b1c0
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(13), borrow cell: 0x7f40bb7fb1c0
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(9), borrow cell: 0x7f40bbfff1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(4), borrow cell: 0x7f40d0b091c0
Thread #ThreadId(3), stops borrow cell
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(12), borrow cell: 0x7f40bb9fc1c0
Thread #ThreadId(3), borrow cell: 0x7f40d0d0a1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(8), borrow cell: 0x7f40d02fc1c0
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(7), borrow cell: 0x7f40d05001c0
Thread #ThreadId(11), borrow cell: 0x7f40bbbfd1c0
thread '<unnamed>' panicked at tfhe/src/shortint/engine/mod.rs:219:45:
already borrowed: BorrowMutError
Thread #ThreadId(13), stops borrow cell
Thread #ThreadId(6), stops borrow cell
Thread #ThreadId(4), stops borrow cell
Thread #ThreadId(5), stops borrow cell
Thread #ThreadId(2), stops borrow cell
Thread #ThreadId(11), stops borrow cell
Thread #ThreadId(8), stops borrow cell
Thread #ThreadId(9), stops borrow cell
Thread #ThreadId(7), stops borrow cell
Thread #ThreadId(12), stops borrow cell
Thread #ThreadId(3), stops borrow cell
Unable to execute CompactFheBool operations within a rayon iterator A BorrowMutError is randomly raised when executing a CompactFheBool operation inside a rayon parallel iterator. The bug arises when trying to access the local thread RefCell that encapsulates the ShortEngine. The bug arises in Release AND Debug. The problem does not occur with FheBool or CompressedFheBool.
To Reproduce
The main.rs file
The Cargo.toml debug file
The Cargo.toml release file
Logs
Configuration: