Open hawkinsp opened 11 months ago
This looks like issue #10394 (while loop without collective-permute is crashing collective-permute-motion), so perhaps this is fixed (by #10395)?
I think this can be closed, see https://github.com/google/jax/issues/18384.
https://github.com/google/jax/issues/18384 describes a JAX segfault on 2 or more GPUs, which turns out to be an XLA crash:
Stack trace:
The problematic HLO appears to be: