Closed bhosalems closed 2 years ago
Hi @lsbmsb,
The TransMorph training script takes around 18 GB of memory for 160x192x224 images, thus 11 GB will not be sufficient to accommodate the training. If you are only interested in reproducing the results, 11 GB should be plenty for inference. You may find the pre-trained model here: https://github.com/junyuchen245/TransMorph_Transformer_for_Medical_Image_Registration/blob/main/TransMorph_on_IXI.md#transmorph-variants
You may also try TransMorph-bspl and TransMorph-diff, as they require less memory.
One last thing you could try is changing the Transformer configurations: https://github.com/junyuchen245/TransMorph_Transformer_for_Medical_Image_Registration/blob/495e7c9fa76ff885a358c291367d1e41cb5f9052/IXI/TransMorph/models/configs_TransMorph.py#L28-L53
Try using smaller embed_dim, num_heads, or reg_head_chan. You can also turn on use_checkpoint to save memory (this saves ~3GB of memory).
Junyu
I updated the reg_head_chan from 96 to 64. It is working now, thanks for your help.
glad to be of help :)
RuntimeError: CUDA out of memory. Tried to allocate 420.00 MiB (GPU 1; 11.91 GiB total capacity; 10.87 GiB already allocated; 316.25 MiB free; 11.07 GiB reserved in total by PyTorch)
I keep getting the above error. I tried freeing the cache and tried to print out the memory usage summary, but don't understand what does each type mean,
I was specifically running train_TransMorph.py. One suggestion was to reduce the batch size, but it's already set to 1. It might be possible to delete and collect the memory of unused variables and a few other things suggested in the PyTorch forum, but I am not yet confident to changing the training loop.
There's also one issue - the inability of allocating fragmented blocks, fixed in https://github.com/pytorch/pytorch/pull/44742. I am not quite sure in which PyTorch version this is fixed, following up more on that.
However, meanwhile, any thoughts on how to resolve this or any other thoughts on a workaround? Also, is it possible to know how much peak memory would be needed while training?
Thanks