About mask generating and adaptation

HeeseongEom commented 3 months ago

Hello, Ashwinee Panda

I was very impressed with your work and wanted to thank you for the excellent contribution. I am currently following the tutorial using the openbookqa task to finally experiment with how well the model adapts to finetuning in my medical domain. However, I seem to be encountering some difficulties, possibly due to a misunderstanding on my part, so I wanted to leave a comment to seek some clarification.

As I understand it, I should first conduct finetuning through rlaif and then run save_mask afterwards. However, it seems that the process does not work properly unless a mask_path is specified during the finetuning step, which has led me to some questions.

I would like to ask for a more detailed guide on how to find and apply a 0.2 sparse mask for the openbookqa dataset.

The code I previously ran is as follows: python -u train_single_gpu.py do_first_eval=False \ loss=sft \ model=llama7b \ model.archive="null" \ datasets=[openbookqa] \ exp_name=test1 \ eval_batch_size=16 \ sample_during_eval=false \ lr=1e-7 \ trainer=BasicTrainer \ activation_checkpointing=True \ data_fraction=1.0 \ save_every=epoch_2 \ eval_every=5000 \ n_epochs=100000 \ batch_size=8 \ gradient_accumulation_steps=1 \ model.fsdp_policy_mp=bfloat16 \ optimizer=RMSprop \ grad_norm_strategy=even \ max_grad_norm=10

One thing I am curious about is how the command should be set up to generate the mask. I would greatly appreciate it if you could provide a simple guide on this process.

Thank you, Heeseong Eom

kiddyboots216 commented 3 months ago

Hi, to generate the mask you can use this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/make_diffs_and_masks.sh an example is given in https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/continual_learning.sh#L66

Note: You should not set n_epochs=100000 because an epoch is a pass over your entire dataset. Also, you should always set `trainer=FSDPTrainer. I would recommend following this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/train_single_gpu.sh and just editing the top of the file with the paths that you want and arguments. So in your case, just change the model to llama7b, and change the dataset to openbookqa, and that should be sufficient.

So basically first you are going to train on your dataset for however long you want, this will give you a task vector, then you are going to run the script that will make a diff (make sure you install mergekit from the appropriate folder in this repo first), and then it will make a mask with some level of sparsity. Then you can run the same command but this time pass the mask path that you just created and the level of sparsity.

Hope this helps!

kiddyboots216 commented 3 months ago

Also, the sparsity level is the fraction of coordinates that are not updated. I'd say by default go with 0.90 for this when you actually create the mask.

kiddyboots216 / lottery-ticket-adaptation

About mask generating and adaptation #2