Open Sreyan88 opened 7 months ago
I used something similar to this (if you find anything here inconsistent with the log, please feel free to replace it). The stage-0 was trained on an other server at Northwestern with 3x RTX-A6000, which I only kept the log and pre-trained weights.
model:
arch: pformer_opt
model_type: pformer_opt2.7b
load_pretrained: False
# intialize stage 2 pretraining from stage 1 pretrained model
# pretrained: "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth"
freeze_vit: True
datasets:
sentence_dataset:
text_processor:
train:
name: "blip_caption"
run:
task: image_text_pretrain ### no need to change
# runner: runner_iter
# optimizer
lr_sched: "linear_warmup_cosine_lr"
init_lr: 1e-4
min_lr: 1e-5
warmup_lr: 1e-6
weight_decay: 0.05
max_epoch: 5
# max_iters: 60000
# iters_per_inner_epoch: 6000
batch_size_train: 128
batch_size_eval: 64
num_workers: 4
warmup_steps: 2000
seed: 42
output_dir: "output/BLIP-T/Pretrain_stage0"
amp: True
resume_ckpt_path: null
evaluate: False
train_splits: ["train"]
device: "cuda"
world_size: 3
dist_url: "env://"
distributed: True
Hi there,
Great work! Could you please provide us with the
pretrain_stage0.sh
or the config file (except the log file). We would like to reproduce some experiments! Thank You!