Fine-tuning BERT model without Trainer

huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Apache License 2.0

153 stars 202 forks source link

Fine-tuning BERT model without Trainer #227

Closed shoang22 closed 1 year ago

shoang22 commented 1 year ago

Hello,

I have a custom model that I've incorporated BERT into. Is it possible to train this model using a normal training loop?

Example:

def training_loop(dataloader, model1):
    device = torch.device('hpu')
    model1 = model1.to(device)
    model2 = AutoModel.from_pretrained('bert-base-uncased').to(device)
    custom_model = some_wrapper(model1, model2)
    for batch in dataloader:
        batch = batch.to(device)
        output = custom_model(batch)

    ...

regisss commented 1 year ago

Hi @shoang22! You would need to follow Habana's SDK documentation to have a properly working training loop. But then you would also need to add mixed precision, distributed training and so on to make the most of your Gaudi instance. This is all taken care of in the GaudiTrainer provided in the Optimum Habana library. Do you have any constraint that prevents you from using it?

shoang22 commented 1 year ago

I was actually having some issues with Trainer, so I opted for a direct implementation. I've since resolved the issue, and the Trainer is working perfectly. Thanks!