mlfoundations / model-soups

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
MIT License
426 stars 38 forks source link

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

This repository contains code for the paper Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time.

Using this repository you can reproduce the figure below, which shows that model soups (averaging multiple fine-tuned solutions) can outperform the best individual model. As an alternative to this repository, Cade Gordon has made the following colab notebook to explore model soups on CIFAR10.

Code

There are 5 steps to reproduced the figure above: 1) downloading the models, 2) evaluating the individual models, 3) running the uniform soup, 4) running the greedy soup, and 5) making the plot.

Note that any of these steps can be skipped, i.e, you can immediately generate the plot above via python main.py --plot. You can also run the greedy soup without evaluating the individual models. This is because we have already completed all of the steps and saved the results files in this repository (i.e., individual_model_results.jsonl). If you do decide to rerun a step, the corresponding results file or plot is deleted and regenerated.

The exception is step 1, downloading the models. If you wish to run steps 2, 3, or 4 you must first run step 1.

Install dependencies and downloading datasets

To install the dependencies either run the following code or see environment.md for more information.

conda env create -f environment.yml
conda activate model_soups

To download the datasets see datasets.md. When required, set --data-location to the $DATA_LOCATION used in datasets.md.

Step 1: Downloading the models

python main.py --download-models --model-location <where models will be stored>

This will store models to --model-location.

Step 2: Evaluate individual models

python main.py --eval-individual-models --data-location <where data is stored> --model-location <where models are stored>

Note that this will first delete then rewrite the file individual_model_results.jsonl.

Step 3: Uniform soup

python main.py --uniform-soup --data-location <where data is stored> --model-location <where models are stored>

Note that this will first delete then rewrite the file uniform_soup_results.jsonl.

Step 4. Greedy soup

python main.py --greedy-soup --data-location <where data is stored> --model-location <where models are stored>

Note that this will first delete then rewrite the file greedy_soup_results.jsonl.

Step 5. Plot

python main.py --plot

Note that this will first delete then rewrite the file figure.png.

Note

If you want, you can all steps with:

python main.py --download-models --eval-individual-models --uniform-soup --greedy-soup --plot --data-location <where data is stored> --model-location <where models are stored>

Also note: if you are interested in running ensemble baselines, check out the ensemble branch.

Also note: if you are interested in running a minial example of wise-ft, you can run python wise-ft-example.py --download-models.

Also note: if you are interested in running minimal examples of zeroshot/fine-tuning, you can run python zeroshot.py or python finetune.py. See program arguments (i.e., run with --help) for more information. Note that these are minimal examples and do not contain rand-aug, mixup, or LP-FT.

Questions

If you have any questions please feel free to raise an issue. If there are any FAQ we will answer them here.

Authors

This project is by the following authors, where * denotes equal contribution (alphabetical ordering):

Citing

If you found this repository useful, please consider citing:

@InProceedings{pmlr-v162-wortsman22a,
  title =    {Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time},
  author =       {Wortsman, Mitchell and Ilharco, Gabriel and Gadre, Samir Ya and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Morcos, Ari S and Namkoong, Hongseok and Farhadi, Ali and Carmon, Yair and Kornblith, Simon and Schmidt, Ludwig},
  booktitle =    {Proceedings of the 39th International Conference on Machine Learning},
  pages =    {23965--23998},
  year =     {2022},
  editor =   {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume =   {162},
  series =   {Proceedings of Machine Learning Research},
  month =    {17--23 Jul},
  publisher =    {PMLR},
  pdf =      {https://proceedings.mlr.press/v162/wortsman22a/wortsman22a.pdf},
  url =      {https://proceedings.mlr.press/v162/wortsman22a.html}
}