Having cuda error when applying custom features

TikaToka commented 1 year ago

Hi, @Sy-Zhang and team. First of all, thank you for sharing your work!

I have a question using your code.

I am trying to use your model with my visual and text features (using clip)

For charades dataset, it worked well.

However for TACoS, the cuda error occurs as below

Traceback (most recent call last):
  File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 22, in forward
    txt_h = self.tex_linear(txt_h)[:,:,None,None]
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

and when i pass CUDA_LAUNCH_BLOCKING=1,

File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 23, in forward
    map_h = self.vis_conv(map_h)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([32, 512, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
    data_type = CUDNN_DATA_FLOAT
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x5610fdbf3970
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
output: TensorDescriptor 0x5610fdaa24f0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
weight: FilterDescriptor 0x5610fdbe9210
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 512, 512, 1, 1,
Pointer addresses:
    input: 0x7fac74000000
    output: 0x7face6000000
    weight: 0x7fad8db00000

I changed config file like below

WORKERS: 16

MODEL_DIR: ./models/conv
RESULT_DIR: ./results/conv
LOG_DIR: ./log
DATA_DIR: ./data/TACoS
FEATURE_DIR: {directory to my visual_features} <- custom added and worked well on charades.

DATASET:
  NAME: TACoS
  VIS_INPUT_TYPE: clip
  NO_VAL: True
  NUM_SAMPLE_CLIPS: 256
  TARGET_STRIDE: 2
  NORMALIZE: True
  RANDOM_SAMPLING: False

TEST:
  BATCH_SIZE: 32
  RECALL: 1,5
  TIOU: 0.1,0.3,0.5,0.7
  EVAL_TRAIN: False
  NMS_THRESH: 0.5

CUDNN:
  DETERMINISTIC: False
  BENCHMARK: True

TRAIN:
  BATCH_SIZE: 32
  LR: 0.0001
  WEIGHT_DECAY: 0.0000
  MAX_EPOCH: 100
  CONTINUE: False

LOSS:
  NAME: bce_rescale_loss
  PARAMS:
    MIN_IOU: 0.3
    MAX_IOU: 0.7
    BIAS: 0.0

TAN:
  FRAME_MODULE:
    NAME: FrameAvgPool
    PARAMS:
      INPUT_SIZE: 512 <<< 
      HIDDEN_SIZE: 512
      KERNEL_SIZE: 2
      STRIDE: 2

  PROP_MODULE:
    NAME: SparsePropConv
    PARAMS:
      HIDDEN_SIZE: 512
      NUM_SCALE_LAYERS: [16, 8, 8, 8]

  FUSION_MODULE:
    NAME: BaseFusion
    PARAMS:
      HIDDEN_SIZE: 512
      TXT_INPUT_SIZE: 512 <<<
      TXT_HIDDEN_SIZE: 512
      LSTM:
        NUM_LAYERS: 3
        BIDIRECTIONAL: False

  MAP_MODULE:
    NAME: MapConv
    PARAMS:
      INPUT_SIZE: 512
      HIDDEN_SIZES: [512, 512, 512, 512, 512, 512, 512, 512]
      KERNEL_SIZES: [5, 5, 5, 5, 5, 5, 5, 5]
      STRIDES: [1, 1, 1, 1, 1, 1, 1, 1]
      PADDINGS: [16, 0, 0, 0, 0, 0, 0, 0]
      DILATIONS: [1, 1, 1, 1, 1, 1, 1, 1]

  PRED_INPUT_SIZE: 512

MODEL:
  NAME: TAN
  CHECKPOINT: ./checkpoints/TACoS/iter016165-0.4644-0.7443.pkl

this type of changing config also worked well on charades.

for loading a features, i am using code like this in ./lib/dataset/tacos.py

def get_word_embedding(self, sentence):
        inputs = self.clip_tokenizer(sentence, return_tensors="pt")
        with torch.no_grad():
            features = self.clip_model(**inputs)
            last_hidden_state_feature = features.last_hidden_state.squeeze() 
            last_hidden_state_feature = last_hidden_state_feature
        return last_hidden_state_feature
def get_video_features(self, vid):
        feature_path = os.path.join(self.feature_dir, vid + '.npz')
        features = torch.Tensor(np.load(feature_path)['features'][:]).float()
        if config.DATASET.NORMALIZE:
            features = F.normalize(features, dim=1)
        vis_mask = torch.ones((features.shape[0], 1))
        return features, vis_mask

Also this code worked well when applying to charades.

the code snippet given in error code do not produce any problem

Is there any problem you might think that occurs?

as charades works well, I think it is not my gpu or system problem.

Thank you in advance!

Sy-Zhang commented 1 year ago

Hi, @Sy-Zhang and team. First of all, thank you for sharing your work!

I have a question using your code.

I am trying to use your model with my visual and text features (using clip)

For charades dataset, it worked well.

However for TACoS, the cuda error occurs as below

Traceback (most recent call last):
  File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 22, in forward
    txt_h = self.tex_linear(txt_h)[:,:,None,None]
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

and when i pass CUDA_LAUNCH_BLOCKING=1,

File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 23, in forward
    map_h = self.vis_conv(map_h)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([32, 512, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
    data_type = CUDNN_DATA_FLOAT
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x5610fdbf3970
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
output: TensorDescriptor 0x5610fdaa24f0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
weight: FilterDescriptor 0x5610fdbe9210
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 512, 512, 1, 1,
Pointer addresses:
    input: 0x7fac74000000
    output: 0x7face6000000
    weight: 0x7fad8db00000

I changed config file like below

WORKERS: 16

MODEL_DIR: ./models/conv
RESULT_DIR: ./results/conv
LOG_DIR: ./log
DATA_DIR: ./data/TACoS
FEATURE_DIR: {directory to my visual_features} <- custom added and worked well on charades.

DATASET:
  NAME: TACoS
  VIS_INPUT_TYPE: clip
  NO_VAL: True
  NUM_SAMPLE_CLIPS: 256
  TARGET_STRIDE: 2
  NORMALIZE: True
  RANDOM_SAMPLING: False

TEST:
  BATCH_SIZE: 32
  RECALL: 1,5
  TIOU: 0.1,0.3,0.5,0.7
  EVAL_TRAIN: False
  NMS_THRESH: 0.5

CUDNN:
  DETERMINISTIC: False
  BENCHMARK: True

TRAIN:
  BATCH_SIZE: 32
  LR: 0.0001
  WEIGHT_DECAY: 0.0000
  MAX_EPOCH: 100
  CONTINUE: False

LOSS:
  NAME: bce_rescale_loss
  PARAMS:
    MIN_IOU: 0.3
    MAX_IOU: 0.7
    BIAS: 0.0

TAN:
  FRAME_MODULE:
    NAME: FrameAvgPool
    PARAMS:
      INPUT_SIZE: 512 <<< 
      HIDDEN_SIZE: 512
      KERNEL_SIZE: 2
      STRIDE: 2

  PROP_MODULE:
    NAME: SparsePropConv
    PARAMS:
      HIDDEN_SIZE: 512
      NUM_SCALE_LAYERS: [16, 8, 8, 8]

  FUSION_MODULE:
    NAME: BaseFusion
    PARAMS:
      HIDDEN_SIZE: 512
      TXT_INPUT_SIZE: 512 <<<
      TXT_HIDDEN_SIZE: 512
      LSTM:
        NUM_LAYERS: 3
        BIDIRECTIONAL: False

  MAP_MODULE:
    NAME: MapConv
    PARAMS:
      INPUT_SIZE: 512
      HIDDEN_SIZES: [512, 512, 512, 512, 512, 512, 512, 512]
      KERNEL_SIZES: [5, 5, 5, 5, 5, 5, 5, 5]
      STRIDES: [1, 1, 1, 1, 1, 1, 1, 1]
      PADDINGS: [16, 0, 0, 0, 0, 0, 0, 0]
      DILATIONS: [1, 1, 1, 1, 1, 1, 1, 1]

  PRED_INPUT_SIZE: 512

MODEL:
  NAME: TAN
  CHECKPOINT: ./checkpoints/TACoS/iter016165-0.4644-0.7443.pkl

this type of changing config also worked well on charades.

for loading a features, i am using code like this in ./lib/dataset/tacos.py

def get_word_embedding(self, sentence):
        inputs = self.clip_tokenizer(sentence, return_tensors="pt")
        with torch.no_grad():
            features = self.clip_model(**inputs)
            last_hidden_state_feature = features.last_hidden_state.squeeze() 
            last_hidden_state_feature = last_hidden_state_feature
        return last_hidden_state_feature
def get_video_features(self, vid):
        feature_path = os.path.join(self.feature_dir, vid + '.npz')
        features = torch.Tensor(np.load(feature_path)['features'][:]).float()
        if config.DATASET.NORMALIZE:
            features = F.normalize(features, dim=1)
        vis_mask = torch.ones((features.shape[0], 1))
        return features, vis_mask

Also this code worked well when applying to charades.

the code snippet given in error code do not produce any problem

Is there any problem you might think that occurs?

as charades works well, I think it is not my gpu or system problem.

Thank you in advance!

I didn't meet this problem before. Could you try reducing the batch size to see if it is relevant to GPU memory? If not, could you try different number of GPUs to see if it is relevant to the number of GPUs?

TikaToka commented 1 year ago

Thank you for quick response!

I tried to reduce batch size to 4 but also didn't work. (I am using 8 Quadro RTX 8000 so I don't think the memory might cause the issue)

Also, I tried to allocate different number of gpus like 1, 2, 4, 8 ... all returned same error.

And assigning different gpus like (CUDA_VISIBLE_DEVICES=0,1,2,3 vs CUDA_VISIBLE_DEVICES=4,5,6,7) also didn't solved my problem :(

researchmm / 2D-TAN

Having cuda error when applying custom features #2