pipilurj/bootstrapped-preference-optimization-BPO

This repository contains the code for the paper titled "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization". [Link to our paper]

Install Packages


conda create -n bpo python=3.10 -y

conda activate bpo

pip install -e .

Install flash attention for efficient training

pip install -e ".[train]"

pip install flash-attn --no-build-isolation

Training data

Download ShareGPT4V from here

Download COCO from here

Download dataset annotation from here

Extract data from ShareGPT4V and organize the images as follows:

Image_root
├── coco/
    ├──train2017/
├── llava/
    ├──llava_pretrain/
├── sam/
├── share_textvqa/
    ├──images/
├── web-celebrity/
    ├──images/
├── web-landmark/
    ├──images/
├── wikiart/
    ├──images/

Training

Training BPO

bash scripts/finetune_bpo.sh

Training BPO with flash attention

bash scripts/finetune_bpo_flash.sh

Acknowledgement

The project is built on top of the amazing multimodal large language model LLaVA, RLHF package trl, DPO for multimodal learning Silkie, and visual contrastive decoding VCD. Thanks for these great work!

If you find our work useful for your research or applications, please cite using this BibTeX:

@misc{pi2024strengthening,
      title={Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization},
      author={Renjie Pi and Tianyang Han and Wei Xiong and Jipeng Zhang and Runtao Liu and Rui Pan and Tong Zhang},
      year={2024},
      eprint={2403.08730},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}