Closed thundercult closed 1 year ago
Hi, @thundercult , could you provide the full traceback for the error? So I can look into it.
Of course,do you mind I add your wechat or any else way to contact with you?That will be more convenient.
I think it would be better if we discuss here so that maybe other people will benefit from what we have discussed.
From the traceback, the error happens with sliding_window_inference
function of package MONAI
. This function requires an argment sw_batch_size
, which means the batch size of the patches during inference. Setting this value too high can also raise a CUDA OOM error, maybe you can look into it.
Thank you very much. traceback was accidentally deleted by me. I'll put it down.
Traceback (most recent call last):
File "/home3/@/RCPS-main/train.py", line 186, in
Sorry to bother you again,I have changed the sw_batch_size from 4 to 1,but the error still arise(cuda out of memory).Whether this error is related to DDP of torch or not?
Perhaps? My experiments are carried out on 2 RTX 3090 and it works fine in my case. Maybe you can test running with fewer GPUs and see if the problem still exists.
Hello,thanks for your sharing very much.And where is the sharpen operation in the code? I've been searching for it for a long time but I haven't found it.
Hello,thanks for your sharing very much.And where is the sharpen operation in the code? I've been searching for it for a long time but I haven't found it.
pseudo_label = F.softmax(targets / self.cfg['TRAIN']['TEMP'], dim=1).detach()
Hello,thanks for your sharing very much.When I tried to run the train.py,the Error always happened no matter how many Gpus I used. It is strange that different Gpus require different amounts of memory.The best Gpu I used is 4 NVIDIA A100 . Training is good but on the fifth iteration evaluation loop started, the error always arised. Do you know how to fix it?