xamyzhao / brainstorm

Implementation of "Data augmentation using learned transforms for one-shot medical image segmentation"
MIT License
392 stars 91 forks source link

OOM when allocating tensor with shape[1,16,160,192,224] and type float #10

Closed andmax closed 5 years ago

andmax commented 5 years ago

The GPU is "GeForce GTX 1080 Ti" got the out-of-memory (OOM) error running:

$ python3 main.py trans --gpu 0 --data mri-100unlabeled --model flow-fwd

Error:

OOM when allocating tensor with shape[1,16,160,192,224] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

xamyzhao commented 5 years ago

It seems that you're trying to use a batch size of 16 -- did you modify the data loading code to make your volume shape [1, 16, 160, 192, 224]? It should be [1, 160, 192, 224, 1] for each batch.

xamyzhao commented 5 years ago

Or, is this an error from an intermediate tensorflow computation? If so, even a batch size of 1 might be too large to fit on a 1080 Ti -- it barely fits on a Titan X, which has 12 GB of memory. I would recommend using a larger GPU if possible. If not, you might need to work with smaller (e.g. downsampled) volumes.

andmax commented 5 years ago

Hi Amy, thank you for your answer. I am running your main.py script with default parameters, which I believe sets batch size to 1. Also, the network architectures have been printed out, so I guess it is from an intermediate tensorflow computation. I was wandering which GPU you use, but it seems that it was a Titan X with 12GB. That is good to know. :) My GPU (1080 Ti) also has 12GB, so there must be something in the code to change in order for it to run.

xamyzhao commented 5 years ago

My understanding is that the 1080 Ti has slightly less memory available than the Titan X -- what does nvidia-smi say? Mine has me using 11557MiB of 12196MiB.

andmax commented 5 years ago

Mine has total of 11178MiB and goes out of memory. That additional ~400MB maybe the problem. Thanks Amy!

xamyzhao commented 5 years ago

Glad we sorted it out! If you don't have another GPU available, it might be worth trying to change your default float precision in your keras.json to float16: https://keras.io/backend/#kerasjson-details.

Thanks for pointing this issue out! I'll add a note about GPU memory to the readme.