ml-jku / EVA

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
MIT License
29 stars 0 forks source link

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

This repository contains code for EVA

Authors:

Fabian Paischer1, Lukas Hauzenberger1, Thomas Schmied1, Benedikt Alkin1,3, Marc Peter Deisenroth2, Sepp Hochreiter1,3

* equal contribution
1 ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria

2 University College London

3 NXAI GmbH, Linz, Austria

Method

Explained Variance Adaptation (EVA) is a novel intialization method for LoRA style adapters which initializes adapter weights in a data driven manner and adaptively allocates ranks according to the variance they explain. EVA improves average performance on a multitude of tasks across various domains, such as Language generation and understanding, Image classification, and Decision Making.

The code for our image classification experiments can be found here. All remaining experiments will be made available in this repository. Our code supports all models that are made available through the huggingface hub. We also provide an implementation of EVA in the PEFT library in this [pull request]().

Instructions for fine-tuning with EVA

First, create a conda environment using environment.yaml

SVD for EVA

Before fine-tuning with EVA, you need to create an SVD checkpoint. (the argument svd_filepath in train.sh needs to point to an existing SVD checkpoint). To this end set the variables in bash/run_svd_precompute.sh, such as base_path, model_names and dataset_name, accordingly and execute it. This will create a eva_state_dict in the specified directory.

Fine-tuning

For fine-tuning, simply execute the bash/run_train.sh script. Prior to execution make sure to set crucial arguments such as base_path, model_names and dataset_name.

Evaluation

Reproducing results from the paper

In the following we list the hyperparameters for the math fine-tuning, common sense reasoning, and code fine-tuning tasks to reproduce the results from our paper. After setting these parameters, you can execute the fine-tuning pipeline as explained above.

meta-math/MetaMathQA

qa_datasets

m-a-p/Code-Feedback

Citation

In case you find our work useful, please consider citing it.

@article{paischer2024eva,
    title={One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation}, 
    author={Fabian Paischer, Lukas Hauzenberger, Thomas Schmied, Benedikt Alkin, Marc Peter Deisenroth, Sepp Hochreiter},
    journal={arXiv preprint arXiv:2410.07170},
    year={2024}
}