sudo-Boris / mr-Blip

Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"
BSD 3-Clause "New" or "Revised" License
33 stars 0 forks source link

OOM for QVH Finetuning 8xA100(80G) #3

Closed ayiyayi closed 1 month ago

ayiyayi commented 1 month ago

Hi! Thanks for your excellent work and code sharing! I tried to use 8xA100(80G)s for QVH finetuning. But it still reports OOM error. Could you please share more training detail?

The config I use is:

model:
  arch: blip2_mr
  model_type: pretrain_flant5xl
  load_finetuned: False # True
  use_grad_checkpoint: False
  freeze_vit: True
  task: qformer_freeze_lora
  input_time_format: seconds_integers
  interleave_data: True
  frame_token_aggregation: False

datasets:
  qvh: 
    vis_processor:
        train:
          name: "blip2_video_train"
          n_frms: 60
          image_size: 224
        eval:
          name: "blip_video_eval"
          n_frms: 60
          image_size: 224
    text_processor:
        train:
          name: "blip_question"
          max_words: 50
        eval:
          name: "blip_question"
          max_words: 50

run:
  task: moment_retrieval
  # optimizer
  lr_sched: "linear_warmup_cosine_lr"
  init_lr: 3e-4
  min_lr: 0
  warmup_lr: 1e-8
  # warmup_steps: 4515 # 903 iters/ epoch * 50 epochs * 0.1 = 4515
  warmup_steps: 2255 # 451 iters/ epoch * 50 epochs * 0.1 = 2255
  weight_decay: 0.05
  max_epoch: 50
  batch_size_train: 2
  batch_size_eval: 2
  num_workers: 8
  accum_grad_iters: 2

  max_len: 200
  min_len: 8
  num_beams: 5

  seed: 42
  output_dir: "result/mr_BLIP/QVH/"

  amp: True
  resume_ckpt_path: null

  evaluate: False
  train_splits: ["train"]
  valid_splits: ["val"]
  test_splits: ["test"]

  device: "cuda"
  world_size: 8
  dist_url: "env://"
  distributed: True
  find_unused_parameters: True
sudo-Boris commented 1 month ago

Thank you! :)

Please try setting the batch_size_train=1, batch_size_eval=1, and accum_grad_iters=2. Then it should work!