CUDA OOM with jax/pytorch notebook

google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.

Apache License 2.0

2.14k stars 234 forks source link

Hi, the notebook on the jax + torch tutorial is very nice and useful for me, but it uses a certain flag: os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"] = "platform"

I understand this flag prevents a CUDA OOM issue, but it has been mentioned by the jax team that it also strongly slows down computation https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html

I tried to remove it and solve the memory issues in other ways, but I haven't been successful so far. Is there any update from your team on this or maybe at least a current guess on where this memory leak is originating? Any kind of info would be very helpful.

google / brax

CUDA OOM with jax/pytorch notebook #489