huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10k stars 1.27k forks source link

[StackLLaMA] 0 trainable params when loading LLaMA-7B in 8bit #337

Closed samuelhoglund closed 1 year ago

samuelhoglund commented 1 year ago

Hi! I am attempting to use the LLaMA-7B model from https://huggingface.co/decapoda-research/llama-7b-hf as a base model for the reward model training in the StackLLaMA pipeline.

When plugged into training, trainable_params is shown as 0. I'm assuming this will cause the training to have no effect. I have not yet been able to start a training so I cannot investigate this on my own yet.

Does anyone know if it is possible to load the model this way, and why is trainable_params set to 0? Mainly, is it possible to load a huggingface model with this pipeline and not having the model stored on your local machine? Moreover, is it possible to load the model in 8bit to reduce memory usage?

I want to clarify that I am somewhat of a newbie when it comes to this stuff, hence these might be stupid questions.

younesbelkada commented 1 year ago

hi @samuelhoglund ! Thanks for your issue! Can you share the snippet where you load the model and adapters on top of it?

samuelhoglund commented 1 year ago

Hey @younesbelkada, thank you for your quick answer! After reviewing my code to answer your question, I realized that I first loaded the decapoda-research/llama-7b-hf through:

model = LlamaForCausalLM.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = get_peft_model(model, peft_config) 

(Peft config is left the same as in the original reward_modeling.py file)

Then I loaded a fine-tuned version I made a while back on some psychology data:

model = PeftModel.from_pretrained(model, "samhog/psychology-alpaca") 

I realized that this step was unnecessary for the reward model and therefore removed it. Then, when running the script again, I got my expected trainable parameters!

So, in conclusion, using this model might work? I'll return with an answer in this thread once I got training done. Do you think using the decapoda-research/llama-7b-hf as a RM could work, @younesbelkada?

samuelhoglund commented 1 year ago

I've learnt quite a bit since yesterday and now I understand better how adapters work with PEFT.

Reading your question again, @younesbelkada, I assume the adapters you mentioned would be the adapters I load in from samhog/psychology-alpaca that I mentioned in my first answer.

That begs the question, why did loading the model like this, with the adapters from my repo, result in 0 trainable parameters? And are there any other adapter layers that would be clever to load on top of the base model to make it better fit as a RM?

younesbelkada commented 1 year ago

Hi @samuelhoglund Thanks for getting back and apologies for the delay. So if I understood it right, loading your model with:

model = LlamaForCausalLM.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = get_peft_model(model, peft_config) 

leads to having 0 trainable params, whereas loading a model with:

model = PeftModel.from_pretrained(model, "samhog/psychology-alpaca") 

Leads to the correct number of trainable parameters. If that's the case, I would appreciate a small and minimal reproducible script so that I can have a deeper look and help you further!

Do you think using the decapoda-research/llama-7b-hf as a RM could work, @younesbelkada?

Yes I think you can do that, however, you need to make sure to load a LlamaForSequenceClassification

samuelhoglund commented 1 year ago

No worries @younesbelkada! Sorry for being unclear in my answer, but it's the other way around. Initially, my code looked like this:

model = LlamaForCausalLM.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = get_peft_model(model, peft_config)
model = PeftModel.from_pretrained(model, "samhog/psychology-alpaca") 

I removed the last line of code above and then I got my trainable params. That is, the code looked like this when I got the trainable params:

model = LlamaForCausalLM.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = get_peft_model(model, peft_config)

Yes I think you can do that, however, you need to make sure to load a LlamaForSequenceClassification

Would that mean I should change the LlamaForCausalLM to LlamaForSequenceClassification?


I would appreciate a small and minimal reproducible script so that I can have a deeper look and help you further!

Here is a link to the notebook I made: https://colab.research.google.com/drive/16j-_oLPAECA5a094zwn6a4mCk1xPA01Y?usp=sharing

There, you can find a link to my fork on the StackLLaMA repo as well. I have only made changes to the reward_modeling.py file so far, as that is the step I'm on!

Thank you for your answer! Have a good day :)

younesbelkada commented 1 year ago

Hi @samuelhoglund Thanks for sharing the script, can you try to add the prepare_for_int8_training method from peft before the call to get_peft_model like that:

from peft import prepare_model_for_int8_training

model = LlamaForCausalLM.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
samuelhoglund commented 1 year ago

Hello again @younesbelkada. Thank you for the tips!

I changed the LlamaForCausalLM to LlamaForSequenceClassification, i.e., my code now looks something like this:

model = LlamaForSequenceClassification.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.float16, load_in_8bit=True
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

But it did not work. I get the following error:

Some weights of the model checkpoint at decapoda-research/llama-7b-hf were not used when initializing LlamaForSequenceClassification: ['lm_head.weight']
- This IS expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
...
ValueError: weight is on the meta device, we need a `value` to put in on 0.

Digging deeper, the first two warnings are expected. Moreover, though, some people seem to say the decapoda-research/llama-7b-hf no longer works for basically anything. I tried using huggyllama/llama-7b as well but I got the same error. To try and tie the model weights, I added model.tie_weights(), so now the code looks like this:

model = LlamaForSequenceClassification.from_pretrained(
    script_args.model_name, num_labels=1, device_map="auto", torch_dtype=torch.bfloat16, load_in_8bit=True
)
model.tie_weights()
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config

This did not help either, though. It seems the error has to do with accelerate when it tries to map the weights. This, in turn, comes from calling device_map="auto" which is required for loading the model in 8bit. Maybe it does not work using 8bit here. Does anyone have any ideas about this?

wrmthorne commented 1 year ago

I made a comment on how I fixed a different but similar problem to do with meta-tensors here: https://github.com/tloen/alpaca-lora/issues/368#issuecomment-1556214618

You can force the device_map by setting device_map={'': 0} for both the base model and the adapter. I don't know if it will solve this specific error though.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.