Closed shoxa-mir closed 2 years ago
Hi, I haven't met this issue. Can you trace down where/ which line of the code triggered the out-of-memory issue? Also remark that the memory usage outputted in the training log is not the real memory usage. Please use "nvidia-smi" to check the actual memory used by the program.
Got it thank you
@Shoxa-Mir Could you help me to solve this issue.
Traceback (most recent call last):
File "train_net.py", line 236, in
Over Here i am facing an error.
@GunjanPatel10 Can you include error message itself also?
I also wanted to know your environment information? Have you installed detectron2 from facebookresearch and added CenterNet2 folder later? I had some issues when I install detectron2 before putting CenterNet2 folder into "detectron2/projects/" folder. May be you can try to reinstall the model.
@Shoxa-Mir now i am facing new error
eta: 0:01:22 iter: 20 total_loss: 1.907 loss_cls_stage0: 0.1733 loss_box_reg_stage0: 0 loss_cls_stage1: 0.1186 loss_box_reg_stage1: 0 loss_cls_stage2: 0.07879 loss_box_reg_stage2: 0 loss_centernet_loc: 0.9385 loss_centernet_agn_pos: 0.4728 loss_centernet_agn_neg: 0.007027 time: 1.0148 data_time: 0.0102 lr: 0.0031102 max_mem: 2128M
Traceback (most recent call last):
File "train_net.py", line 236, in
import torch torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark = False torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([2, 160, 24, 32], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(160, 160, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()
ConvolutionParams data_type = CUDNN_DATA_FLOAT padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x564629e7fd10 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 2, 160, 24, 32, strideA = 122880, 768, 32, 1, output: TensorDescriptor 0x56462a190240 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 2, 160, 24, 32, strideA = 122880, 768, 32, 1, weight: FilterDescriptor 0x564629fb3660 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 160, 160, 3, 3, Pointer addresses: input: 0x7f1dbd800000 output: 0x7f1dbdaf0000 weight: 0x7f1ee96e1000 Forward algorithm: 5 could you please tell me where to make changes?
my environment info:-
sys.platform linux
Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
numpy 1.21.3
detectron2 0.6 @/home/keb-pg/anaconda3/envs/CenterNet2/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE
i had installed the detectron2 using this :- python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html which is avilable on install.md readme file No, i havent added centernet2 folder later i.e after install detectron2
@Shoxa-Mir That didnt helped me to solve my issue. Thank you if you could help me?
I have "cuda out of memory error" whereas in pytorch uses only 60% of my GPU memory. Can you help me with this issue? My GPU is RTX3090 wtih 24GB VRAM. However model only uses up to 15-16GB.