huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.79k stars 26.96k forks source link

'DistributedDataParallel' object has no attribute 'save_pretrained' #7980

Closed AI678 closed 3 years ago

AI678 commented 4 years ago

❓ Questions & Help

Details

Hey, I want to use EncoderDecoderModel for parallel trainging. When I save my model, I got the following questions. How can I fix this ?
'DistributedDataParallel' object has no attribute 'save_pretrained'

A link to original question on the forum/Stack Overflow:

LysandreJik commented 4 years ago

Could you provide the information related to your environment, as well as the code that outputs this error, like it is asked in the issue template?

ganeshkharad2 commented 4 years ago

I am facing same issue as the given issu 'DistributedDataParallel' is custom class created by coder that is having base model available in Transformer repo

Where in below code that class is "SentimentClassifier"

 class SentimentClassifier(nn.Module):

  def __init__(self, n_classes):
    super(SentimentClassifier, self).__init__()
    self.bert = BertModel.from_pretrained("bert-base-multilingual-cased")
    self.drop = nn.Dropout(p=0.3)
    self.out = nn.Linear(self.bert.config.hidden_size, n_classes)

  def forward(self, input_ids, attention_mask):
    _, pooled_output = self.bert(
      input_ids=input_ids,
      attention_mask=attention_mask
    )
output = self.drop(pooled_output)
return self.out(output)`

that is why it is giving error -

SentimentClassifier object has no attribute 'save_pretrained'

which is correct but I also want to know how can I save that model with my trained weights just like the base model so that I can Import it in few lines and use it.

only thing I am able to obtaine from this finetuning is a .bin file and I am not able to load state dict also

I am looking for way to save my finetuned model with "save_pretrained"

LysandreJik commented 4 years ago

Instead of inheriting from nn.Module you could inherit from PreTrainedModel, which is the abstract class we use for all models, that contains save_pretrained. Can you try that?

ganeshkharad2 commented 4 years ago

fine-tuning codes I seen on hugging face repo itself shows the same way to do that...so I did that... bdw I will try as you said and will update here

here is the link i refered that from

https://huggingface.co/transformers/notebooks.html

AI678 commented 4 years ago

Hey, My code just like this

from transformers import EncoderDecoderModel, BertTokenizer
import torch
import argparse
import os
import argparse
import torch.multiprocessing as mp
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.distributed as dist

def main():
     parser = argparse.ArgumentParser()
     args = parser.parse_args()
     args.max_src_len = 512
     args.max_dst_len = 128
     args.gpus = 4
     args.world_size = args.gpus
     args.epoches = 30
     mp.spawn(train, nprocs=args.gpus, args=(args,))

def train(gpu, args):
     rank = gpu
     dist.init_process_group(                                   
        backend='nccl',                                         
        init_method='tcp://127.0.0.1:23456',                                   
        world_size=args.world_size,                              
        rank=rank                                               
    )    
     torch.manual_seed(0)
     model = EncoderDecoderModel.from_pretrained("bert2bert")
     torch.cuda.set_device(gpu)
     model = model.to(gpu)
     optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
     model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
     dataset_path = 'dataset/example.json'
     vocab_path = 'dataset/vocab.txt'
     dataset = CNNDataset(dataset_path, vocab_path, args)
     train_sampler = torch.utils.data.distributed.DistributedSampler(
        dataset,
        num_replicas=args.world_size,
        rank=rank
    )
     dataloader = DataLoader(dataset, batch_size=32, shuffle=False, 
     num_workers=0,
       pin_memory=True,
      sampler=train_sampler)
     cnt = 0
     for epoch in range(args.epoches):
          for src, dst  in dataloader:

               src = torch.stack(src).to(gpu)
               dst = torch.stack(dst).to(gpu)
               mask = (src!=0)
               mask = mask.long()
               outputs = model(input_ids=src, attention_mask=mask, decoder_input_ids=dst, labels=dst, return_dict=True)
               loss, logits = outputs.loss, outputs.logits
               optimizer.zero_grad()

               loss.backward()

               optimizer.step()

               if cnt % 1000 == 0 and gpu == 0 :
                    model.save_pretrained("bert2bert")
               cnt = cnt + 1

if __name__ == '__main__':

     main()

@LysandreJik ,@ganeshkharad2

AI678 commented 4 years ago

I can save this with state_dict. But how can I load it again with from_pretrained method ?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ahnz7 commented 3 years ago

I can save this with state_dict. But how can I load it again with from_pretrained method ?

Hi, i meet the same problem, have you solved this problem? or?

Abhiram4572 commented 2 years ago

I can save this with state_dict. But how can I load it again with from_pretrained method ?

Hi, Did you find any workaround for this? Thanks in advance.

ragvri commented 2 years ago

Any solution for this?

fxmarty commented 6 months ago

try model.module.save_pretrained