How to use multiple PreTrainedModel models in a custom model?

iamlockelightning commented 3 years ago

Details

I am using the Trainer to train a custom model, like this:

class MyModel(nn.Module):
    def __init__(self,):
        super(MyModel, self).__init__()
        # I want the code to be clean so I load the pretrained model like this
        self.bert_layer_1 = transformers.AutoModel.from_pretrained("hfl/chinese-roberta-wwm-ext")
        self.bert_layer_2 = transformers.AutoModel.from_pretrained("bert-base-chinese")
        self.other_layers = ... # not important

    def forward(self,):
        pass # not important

When running trainer.save_model(), it will only save the model's state, as the custom model is not a PreTrainedModel(as the terminal shown below).

Trainer.model is not a `PreTrainedModel`, only saving its state dict.

And when reloading the saved model on production, I need to initialize a new MyModel and load its states, which is not so convenient. I hope to load this model using transformers.AutoModel.from_pretrained('MODEL_PATH') like other PreTrainedModels.

I tried to change class MyModel(nn.Module) to class MyModel(PreTrainedModel), but the PreTrainedModel needs a PretrainedConfig when initialized. I don't have one in the current implementation, I don't know how to manage the config when using multiple PreTrainedModel models. I want to keep the self.bert_layer_1 and self.bert_layer_2 as simple as from_pretrained, not= BertModel(config).

Is there a way to do that?

Environment info

transformers version: 4.9.2
Platform: macOS / Ubuntu
Python version: 3.8.6
PyTorch version (GPU?): 1.8.1 (False) / (yes)
Tensorflow version (GPU?): 2.4.1 (False) / (yes)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: parallel

iamlockelightning commented 3 years ago

Please help. @LysandreJik @sgugger

sgugger commented 3 years ago

A model that is not inside the transformers library won't work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?

iamlockelightning commented 3 years ago

Thanks for your reply! I will try.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

maxpel commented 3 years ago

A model that is not inside the transformers library won't work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?

@sgugger Could you give an example on how to subclass PreTrainedModel? I would also like to integrate my model at https://huggingface.co/maxpe/twitter-roberta-base_semeval18_emodetection better with the transformer library:

def loss_fn(outputs, targets):
return torch.nn.BCEWithLogitsLoss()(outputs, targets)

  class RobertaClass(torch.nn.Module):

    def __init__(self):
        super(RobertaClass, self).__init__()
        self.l1 = AutoModel.from_pretrained("cardiffnlp/twitter-roberta-base",return_dict=False)
        self.l2 = torch.nn.Dropout(0.3)
        self.l3 = torch.nn.Linear(768, 11)

    def forward(self, input_ids, attention_mask,labels):
        _, output_1= self.l1(input_ids=input_ids, attention_mask=attention_mask)
        output_2 = self.l2(output_1)
        output = self.l3(output_2)

        return (loss_fn(labels.float(),output),output)

model=RobertaClass()

model.train()

...

model=RobertaClass()

model.load_state_dict(torch.load(path))

model.eval()

My attempt with PyTorchModelHubMixin didn't work well.

pratikchhapolika commented 1 year ago

@iamlockelightning did you save the model properly??

huggingface / transformers

How to use multiple PreTrainedModel models in a custom model? #13407

Details

Environment info