ljwztc / CLIP-Driven-Universal-Model

[ICCV 2023] CLIP-Driven Universal Model; Rank first in MSD Competition.
Other
521 stars 58 forks source link

GPU memory fluctuations #29

Closed SNEAndy closed 11 months ago

SNEAndy commented 11 months ago

hi, thank you for this great work! I met a problem while I'm using the Swinunetr as backbone: the graphics memory usage of the GPU would suddenly increase, causing the graphics memory to exceed the VRAM of my GPU; During inference, there may also be significant fluctuations in GPU memory. Please give me a clue and let me solve this problem.

ljwztc commented 11 months ago

Could you please provide your GPU information? Generally speaking, some ct size is large, and the GPU memory consumption would be large for these ct. We use at least 24G memory for inference

SNEAndy commented 11 months ago

Could you please provide your GPU information? Generally speaking, some ct size is large, and the GPU memory consumption would be large for these ct. We use at least 24G memory for inference

Thank you for your prompt reply. I am using the NVIDIA RTX3090 graphics card; I wrote the conference file myself and used torch. rand for testing, without undergoing a transformation. The shape size is (1, 1, 96, 96, 96), and I added breakpoints before reclaiming the graphics memory. I observed changes in GPU memory: 2.6G ->5.6G ->2.6G.I am a bit confused about the memory fluctuations in the middle process.

ljwztc commented 11 months ago

have you add torch.no_grad()? I didn't observe this fluctuation

SNEAndy commented 11 months ago

have you add torch.no_grad()? I didn't observe this fluctuation Thank you for reply. yes,i did have.Maybe i should do more tests to find out what goes wrong in my code. Thank you all the same, have a great day :D

huangsusan commented 10 months ago

Hi: I have the same problem. Do you find the answer?

SNEAndy commented 10 months ago

Hi: I have the same problem. Do you find the answer?

According to my preliminary investigation,if you use thetorch.cuda.empty_cache() and the 'torch.backends.cudnn.benchmark = True' at the same time, memory fluctuation will happen.

sharonlee12 commented 7 months ago

When testing the second image, I reported an error 33%|████████████████████████ | 1/3 [01:33<03:06, 93.34s/it] Traceback (most recent call last): File "mytest.py", line 223, in main() File "mytest.py", line 219, in main validation(model, test_loader, val_transforms, args) File "mytest.py", line 55, in validation pred = sliding_window_inference(image, (args.roi_x, args.roi_y, args.roi_z), 1, model, overlap=0.5, mode='gaussian') File "/python3.8/site-packages/monai/inferers/utils.py", line 215, in sliding_window_inference output_image_list.append(torch.zeros(output_shape, dtype=compute_dtype, device=device)) RuntimeError: CUDA out of memory. Tried to allocate 2.70 GiB (GPU 0; 11.91 GiB total capacity; 8.87 GiB already allocated; 2.31 GiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Do you find the answer?

sharonlee12 commented 7 months ago

When testing the second image, I reported an error 33%|████████████████████████ | 1/3 [01:33<03:06, 93.34s/it] Traceback (most recent call last): File "mytest.py", line 223, in main() File "mytest.py", line 219, in main validation(model, test_loader, val_transforms, args) File "mytest.py", line 55, in validation pred = sliding_window_inference(image, (args.roi_x, args.roi_y, args.roi_z), 1, model, overlap=0.5, mode='gaussian') File "/python3.8/site-packages/monai/inferers/utils.py", line 215, in sliding_window_inference output_image_list.append(torch.zeros(output_shape, dtype=compute_dtype, device=device)) RuntimeError: CUDA out of memory. Tried to allocate 2.70 GiB (GPU 0; 11.91 GiB total capacity; 8.87 GiB already allocated; 2.31 GiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Do you find the answer?