Closed ribesstefano closed 10 months ago
Hmm I think the AutoModelForSeq2SeqLMWithValueHead
class does not support EncoderDecoder
class as it requires some patches. Indeed, the solution would be to look for the child configs, we can do that by passing a flag is_composition_model=True
in from_pretrained
. Would you be happy to propose a patch for this? Happy to guide you in more details!
Thanks for the support!
I wrote a quick fix. The following code seems to work now:
pretrained_model = "ribesstefano/ChemBERTa2ChemBERTa-58M"
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained(
pretrained_model,
is_composition_model=True)
model_ref = create_reference_model(model)
# model_ref = create_reference_model(model, num_shared_layers=6)
but create_reference_model(model, num_shared_layers=6)
still breaks when specifying the num_shared_layers
argument.
This is what I did: in the from_pretrained
method of the PreTrainedModelWrapper
I added the is_composition_model
to the trl_model_args
:
if kwargs is not None:
peft_config = kwargs.pop("peft_config", None)
reward_adapter = kwargs.pop("reward_adapter", None)
is_trainable = kwargs.pop("is_trainable", False)
is_composition_model = kwargs.pop("is_composition_model", False)
trl_model_args, pretrained_kwargs, peft_quantization_kwargs = cls._split_kwargs(kwargs)
trl_model_args["is_composition_model"] = is_composition_model
token = pretrained_kwargs.get("token", None)
else:
peft_config = None
is_trainable = False
trl_model_args = {}
pretrained_kwargs = {}
peft_quantization_kwargs = {}
token = None
is_composition_model = False
Then, in the AutoModelForSeq2SeqLMWithValueHead
init
method I added is_composition_model
to the v_head_kwargs
:
self.is_composition_model = kwargs.pop("is_composition_model", False)
v_head_kwargs["is_composition_model"] = self.is_composition_model
Finally, in the ValueHead
init
method I do:
is_composition_model = kwargs.pop("is_composition_model", False)
if is_composition_model:
hidden_size = config.decoder.hidden_size
elif hasattr(config, "word_embed_proj_dim"):
hidden_size = config.word_embed_proj_dim
else:
hidden_size = config.hidden_size
Does this make sense? According to the pull request guidelines, I shall write a test beforehand. Could give me some suggestions on that too please?
Thanks very much @ribesstefano the solution sounds great !
I have a better workaround, instead of passing is_composition_model=True
we can maybe do is_composition_model = isinstance(model, transformers.EncoderDecoderModel)
- in any case let's see in the PR !
For the tests I have just pushed a tiny model on the Hub : https://huggingface.co/trl-internal-testing/tiny-Roberta-EncoderDecoderModel
I would say you can just add that model id here: https://github.com/huggingface/trl/blob/main/tests/test_modeling_value_head.py#L54 and see if the tests pass
Let me know how it goes!
@younesbelkada I have a similar error when trying to load Salesforce/codet5p-16b
. I was wondering if the hidden_size
I need is referenced here? If so, I'm happy to make a PR for the fix if you would offer some guidance. In the meantime I'm looking at the solution posted above.
hi @dshvimer
Thanks! hidden_size
should be properly defined in that model, e.g.: https://huggingface.co/Salesforce/codet5p-16b/blob/d19a75e34b9fd56b260557468de793b112855f1a/modeling_codet5p.py#L92
So I would expect your script to work. Can you confirm just running:
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("Salesforce/codet5p-16b")
Leads to an error for you?
@younesbelkada Here is an example notebook with the larger model failing. From my testing <= 2b works with trl but >2b does not
Thanks for sharing the notebook!
From what I can see it seems that this related to the fact the second model uses classic T5: https://huggingface.co/Salesforce/codet5p-220m/blob/main/config.json that has does not use composition in the configuration object as you can see here: https://huggingface.co/Salesforce/codet5p-6b/blob/main/configuration_codet5p.py#L71
As this is a model that uses a model on the Hub feature, I am not sure if we should support all scenarios as this will be challenging. One thing you could do is to push a copy of Salesforce/codet5p-6b
under your namespace and manually add self.hidden_size = self.encoder.hidden_size
here: https://huggingface.co/Salesforce/codet5p-6b/blob/main/configuration_codet5p.py#L90
@younesbelkada Thank you for the direction. I am thinking this will unblock me for my own use case. To make sure I understand things, could I also set the hidden_size from the trl
side, similar to what is posted above? (If I create a subclass or similar)
is_composition_model = kwargs.pop("is_composition_model", False)
if is_composition_model:
hidden_size = config.decoder.hidden_size
elif hasattr(config, "word_embed_proj_dim"):
hidden_size = config.word_embed_proj_dim
else:
hidden_size = config.hidden_size
Yes the changes you shared make sense! @ribesstefano let us know if you have managed to start the PR so that we could jump over potential improvements, let us know also if you need help for it! :D
@younesbelkada I've made a way simpler modification, as you suggested, and just changed the ValueHead
init in file trl/models/modeling_value_head.py
:
class ValueHead(nn.Module):
r"""
The ValueHead class implements a head for GPT2 that returns a scalar for each output token.
"""
def __init__(self, config, **kwargs):
super().__init__()
if not hasattr(config, "summary_dropout_prob"):
summary_dropout_prob = kwargs.pop("summary_dropout_prob", 0.1)
else:
summary_dropout_prob = config.summary_dropout_prob
self.dropout = nn.Dropout(summary_dropout_prob) if summary_dropout_prob else nn.Identity()
# some models such as OPT have a projection layer before the word embeddings - e.g. OPT-350m
if hasattr(config, "hidden_size"):
hidden_size = config.hidden_size
if hasattr(config, "word_embed_proj_dim"):
hidden_size = config.word_embed_proj_dim
elif hasattr(config, "is_encoder_decoder"):
if config.is_encoder_decoder and hasattr(config, "decoder"):
if hasattr(config.decoder, "hidden_size"):
hidden_size = config.decoder.hidden_size
self.summary = nn.Linear(hidden_size, 1)
self.flatten = nn.Flatten()
After all, the config
s that the ValueHead
is expecting are from the pretrained model, so it is sufficient to check those only.
However, when running tests, to which I've added the https://huggingface.co/trl-internal-testing/tiny-Roberta-EncoderDecoderModel model, it breaks for exactly that added tiny-Roberta-EncoderDecoderModel
model (it passes for all the other models):
===================================================================================================================== FAILURES =====================================================================================================================
____________________________________________________________________________________________ Seq2SeqValueHeadModelTester.test_transformers_bf16_kwargs _____________________________________________________________________________________________
[gw0] linux -- Python 3.10.8 /opt/conda/bin/python
self = <tests.test_modeling_value_head.Seq2SeqValueHeadModelTester testMethod=test_transformers_bf16_kwargs>
def test_transformers_bf16_kwargs(self):
r"""
Test if the transformers kwargs are correctly passed
Here we check that loading a model in half precision works as expected, i.e. the weights of
the `pretrained_model` attribute is loaded in half precision and you can run a dummy
forward pass without any issue.
"""
for model_name in self.all_model_names:
trl_model = self.trl_model_class.from_pretrained(model_name, torch_dtype=torch.bfloat16)
lm_head_namings = self.trl_model_class.lm_head_namings
if model_name == "trl-internal-testing/tiny-random-FSMTForConditionalGeneration":
# skip the test for FSMT as it does not support mixed-prec
continue
> self.assertTrue(
any(hasattr(trl_model.pretrained_model, lm_head_naming) for lm_head_naming in lm_head_namings)
)
E AssertionError: False is not true
tests/test_modeling_value_head.py:430: AssertionError
Do you think the error is related to the modifications I've done in ValueHead
?
Thanks for reporting and for your hardwork ! Regarding the failing test since this is an edge case you can skip it and add a condition similarly as FSMT together with a comment. Let me know once you open the PR!
@younesbelkada Thanks for all the support, I just made a pull request!
Hi all, I've trained an
EncoderDecoder
model but I cannot load it viaAutoModelForSeq2SeqLMWithValueHead.from_pretrained
.My
EncoderDecoder
model has been defined by tying together two RoBERTa model, as follows:Transformers and TRL versions:
Code to reproduce the issue:
Error message:
From what I can guess from the source code,
ValueHead
only inspect the model's top config, expecting anhidden_size
. However, that information is in themodel.decoder.config.hidden_size
for anEncoderDecoder
class. Does it sound like a bug or something to update in the code? Or is there any workaround I can try?Thanks in advance