[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
I tried to finetune with a custom dataset for video-QA task.
Command I ran: python src/tasks/run_video_qa.py --config src/configs/tango_qa_base_resnet50.json --output_dir /home/nadeeshan/output_video_qa/
Error I got: src/tasks/run_video_qa.py:557: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior. grad_norm = clip_grad_norm_( 04/28/2023 18:40:35 - WARNING - root - NaN or Inf found in input tensor. Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Traceback (most recent call last): File "src/tasks/run_video_qa.py", line 715, in <module> start_training(input_cfg) File "src/tasks/run_video_qa.py", line 574, in start_training restorer.step() File "/clipbert/src/utils/load_save.py", line 280, in step if self.global_step % self.save_steps == 0: ZeroDivisionError: integer division or modulo by zero
Hi @jayleicn,
I tried to finetune with a custom dataset for video-QA task. Command I ran:
python src/tasks/run_video_qa.py --config src/configs/tango_qa_base_resnet50.json --output_dir /home/nadeeshan/output_video_qa/
Error I got:
src/tasks/run_video_qa.py:557: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior. grad_norm = clip_grad_norm_( 04/28/2023 18:40:35 - WARNING - root - NaN or Inf found in input tensor. Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Traceback (most recent call last): File "src/tasks/run_video_qa.py", line 715, in <module> start_training(input_cfg) File "src/tasks/run_video_qa.py", line 574, in start_training restorer.step() File "/clipbert/src/utils/load_save.py", line 280, in step if self.global_step % self.save_steps == 0: ZeroDivisionError: integer division or modulo by zero
any idea how to fix this?