ttengwang / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
MIT License
201 stars 23 forks source link

Is there any limit for Batch_size? #4

Closed axxddzh closed 2 years ago

axxddzh commented 2 years ago

I try to train the model with tsn feature.But it only use 2GB GPU memory.So I try to train the model with bitch_size = 8.But there are some error like:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
  0%|                                                                                                                                                                                                   | 0/2502 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 317, in <module>
    train(opt)
  File "train.py", line 181, in train
    output, loss = model(dt, criterion, opt.transformer_input_type)
  File "/home/anaconda3/envs/PDVC-main/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 166, in forward
    disable_iterative_refine)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 299, in parallel_prediction_matched
    others, self.opt.caption_decoder_type, indices)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 387, in caption_prediction
    cap_prob = cap_head(hs[:, feat_bigids], reference[:, feat_bigids], others, seq)
RuntimeError: CUDA error: device-side assert triggered

I have met the same problem when Batch_size not be 1

ttengwang commented 2 years ago

Hi, in this code, the standard captioning module (PDVC) doesn't support batch size > 1, but PDVC_light does. I tried to train PDVC_light with a larger batch size but got a slight performance drop.