Open erdmann opened 2 years ago
Thanks for the report, this looks to be an XLA issue. I filed an internal bug for the XLA folks (Google bug b/219941181).
I met this issue as well!
I am able to "fix" this issue by adding:
import os
os.environ['XLA_PYTHON_CLIENT_PREALLOCATE'] ='false'
os.environ['XLA_PYTHON_CLIENT_ALLOCATOR']='platform'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
to the head of the file.
@hawkinsp Do you have any updates on this issue.?
Hi,
I am blocked by this exact issue using
The error is reproduced below.
I am not familiar with google's ticketing and could not find ticket b/219941181. Has this issue been resolved?
With thanks!
2022-07-06 22:48:01.859405: F external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_fft.cc:439] failed to initialize batched cufft plan with customized allocator: Allocating 5945425920 bytes exceeds the memory limit of 4294967296 bytes.
NB: 4294967296 bytes == 4 GB
There has been some progress on this issue but it isn't completely fixed. I think the main restriction is that for large FFTs the FFT length must be factorizable into primes smaller than 127, which is a limitation we inherit from https://docs.nvidia.com/cuda/cufft/index.html#function-cufftmakeplanmany64
Other than that it's possible things just work now with a jaxlib built from head.
Attempting to perform a somewhat large FFT results in a coredump rather than an OOM RuntimeError.
On my 32 GB v100 GPU, the following code illustrates the problem:
This results in the following output for me:
Also: even if this is fixed to result in a RuntimeError rather than a coredump, is there some way around this relatively small size limitation? I am using
jnp.fft.rfft2
to perform large image convolutions and I often bump up against this.Thanks in advance!