I'm reaching out to discuss parameters set to be trainable.
To specify which parameters should be trainable, one should refer to the projects/OOO/expOOO.yaml file:
You must specify elements in either keys_to_finetune or keys_to_freeze (but specifying elements in both will result in a ValueError).
I was puzzled about what elements could be specified here, so I investigated.
I believe this can be understood by examining the contents of the set_trainable_params function in utils.py.
Also, in this set_trainable_params function of utils.py, parameters to be frozen are specified by matching substrings of model parameter names and strings in the keys_to_freeze list:
for name, p in model.named_parameters():
...
elif np.any([k in name for k in keys_to_freeze]):
p.requires_grad = False
untrainable_list.append(name)
...
In other words, if you specify a string that doesn't exist in the model's modules, it's as if you didn't specify anything at all.
For example, consider checking the modules the model here has:
import torch
from transformers import AutoProcessor
from heron.models.git_llm.git_llama import GitLlamaForCausalLM
device_id = 0
model = GitLlamaForCausalLM.from_pretrained(
'turing-motors/heron-chat-git-Llama-2-7b-v0',
torch_dtype=torch.float16
)
model.eval()
model.to(f"cuda:{device_id}")
print(model)
This allows us to see all parameter names of the turing-motors/heron-chat-git-Llama-2-7b-v0 model.
When specifying the parameters you want to be trainable (or frozen), the names of the elements you specify in keys_to_finetune (or keys_to_freeze) in projects/OOO/expOOO.yaml should match a substring of these parameter names.
will be trainable, while parameters that don't match any substring will be frozen. (Given that there's no parameter name in the turing-motors/heron-chat-git-Llama-2-7b-v0 model that matches num_image_with_embedding, specifying this seems optional.)
Hence, the sample config for training the llama-based VL model here includes:
Even though num_image_with_embedding is specified, I believe it is not necessary. Is my understanding correct?
If there are any errors or misconceptions in my explanation thus far, please let me know.
I'm reaching out to discuss parameters set to be trainable.
To specify which parameters should be trainable, one should refer to the
projects/OOO/expOOO.yaml
file:You must specify elements in either
keys_to_finetune
orkeys_to_freeze
(but specifying elements in both will result in aValueError
).I was puzzled about what elements could be specified here, so I investigated.
I believe this can be understood by examining the contents of the
set_trainable_params
function inutils.py
.https://github.com/turingmotors/heron/blob/a52d8cfa00a6514011dd5d8c7d0b63afe7664c26/heron/models/utils.py#L159C1-L196
Also, in this
set_trainable_params
function ofutils.py
, parameters to be frozen are specified by matching substrings of model parameter names and strings in thekeys_to_freeze
list:In other words, if you specify a string that doesn't exist in the model's modules, it's as if you didn't specify anything at all.
For example, consider checking the modules the model here has:
Additionally, to see the specific parameter names of the
turing-motors/heron-chat-git-Llama-2-7b-v0
model:This allows us to see all parameter names of the
turing-motors/heron-chat-git-Llama-2-7b-v0
model.When specifying the parameters you want to be trainable (or frozen), the names of the elements you specify in
keys_to_finetune
(orkeys_to_freeze
) inprojects/OOO/expOOO.yaml
should match a substring of these parameter names.For instance:
By doing this, only the parameters of the
turing-motors/heron-chat-git-Llama-2-7b-v0
model that match thevisual_projection
element:will be trainable, while parameters that don't match any substring will be frozen. (Given that there's no parameter name in the
turing-motors/heron-chat-git-Llama-2-7b-v0
model that matchesnum_image_with_embedding
, specifying this seems optional.)Hence, the sample config for training the llama-based VL model here includes:
Even though
num_image_with_embedding
is specified, I believe it is not necessary. Is my understanding correct?If there are any errors or misconceptions in my explanation thus far, please let me know.