yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
BSD 3-Clause "New" or "Revised" License
24 stars 1 forks source link

Stage 0 scripts and config #5

Open Sreyan88 opened 7 months ago

Sreyan88 commented 7 months ago

Hi there,

Great work! Could you please provide us with the pretrain_stage0.sh or the config file (except the log file). We would like to reproduce some experiments! Thank You!

yiren-jian commented 6 months ago

I used something similar to this (if you find anything here inconsistent with the log, please feel free to replace it). The stage-0 was trained on an other server at Northwestern with 3x RTX-A6000, which I only kept the log and pre-trained weights.

model:
  arch: pformer_opt
  model_type: pformer_opt2.7b
  load_pretrained: False
  # intialize stage 2 pretraining from stage 1 pretrained model
  # pretrained: "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth"
  freeze_vit: True

datasets:
  sentence_dataset:
    text_processor:
        train:
          name: "blip_caption"

run:
  task: image_text_pretrain   ### no need to change
  # runner: runner_iter
  # optimizer
  lr_sched: "linear_warmup_cosine_lr"
  init_lr: 1e-4
  min_lr: 1e-5
  warmup_lr: 1e-6

  weight_decay: 0.05
  max_epoch: 5
  # max_iters: 60000
  # iters_per_inner_epoch: 6000
  batch_size_train: 128
  batch_size_eval: 64
  num_workers: 4
  warmup_steps: 2000

  seed: 42
  output_dir: "output/BLIP-T/Pretrain_stage0"

  amp: True
  resume_ckpt_path: null

  evaluate: False
  train_splits: ["train"]

  device: "cuda"
  world_size: 3
  dist_url: "env://"
  distributed: True