Closed AI678 closed 3 years ago
Could you provide the information related to your environment, as well as the code that outputs this error, like it is asked in the issue template?
I am facing same issue as the given issu 'DistributedDataParallel' is custom class created by coder that is having base model available in Transformer repo
Where in below code that class is "SentimentClassifier"
class SentimentClassifier(nn.Module):
def __init__(self, n_classes):
super(SentimentClassifier, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-multilingual-cased")
self.drop = nn.Dropout(p=0.3)
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
def forward(self, input_ids, attention_mask):
_, pooled_output = self.bert(
input_ids=input_ids,
attention_mask=attention_mask
)
output = self.drop(pooled_output)
return self.out(output)`
that is why it is giving error -
SentimentClassifier object has no attribute 'save_pretrained'
which is correct but I also want to know how can I save that model with my trained weights just like the base model so that I can Import it in few lines and use it.
only thing I am able to obtaine from this finetuning is a .bin file and I am not able to load state dict also
I am looking for way to save my finetuned model with "save_pretrained"
Instead of inheriting from nn.Module
you could inherit from PreTrainedModel
, which is the abstract class we use for all models, that contains save_pretrained
. Can you try that?
fine-tuning codes I seen on hugging face repo itself shows the same way to do that...so I did that... bdw I will try as you said and will update here
here is the link i refered that from
Hey, My code just like this
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import argparse
import os
import argparse
import torch.multiprocessing as mp
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.distributed as dist
def main():
parser = argparse.ArgumentParser()
args = parser.parse_args()
args.max_src_len = 512
args.max_dst_len = 128
args.gpus = 4
args.world_size = args.gpus
args.epoches = 30
mp.spawn(train, nprocs=args.gpus, args=(args,))
def train(gpu, args):
rank = gpu
dist.init_process_group(
backend='nccl',
init_method='tcp://127.0.0.1:23456',
world_size=args.world_size,
rank=rank
)
torch.manual_seed(0)
model = EncoderDecoderModel.from_pretrained("bert2bert")
torch.cuda.set_device(gpu)
model = model.to(gpu)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
dataset_path = 'dataset/example.json'
vocab_path = 'dataset/vocab.txt'
dataset = CNNDataset(dataset_path, vocab_path, args)
train_sampler = torch.utils.data.distributed.DistributedSampler(
dataset,
num_replicas=args.world_size,
rank=rank
)
dataloader = DataLoader(dataset, batch_size=32, shuffle=False,
num_workers=0,
pin_memory=True,
sampler=train_sampler)
cnt = 0
for epoch in range(args.epoches):
for src, dst in dataloader:
src = torch.stack(src).to(gpu)
dst = torch.stack(dst).to(gpu)
mask = (src!=0)
mask = mask.long()
outputs = model(input_ids=src, attention_mask=mask, decoder_input_ids=dst, labels=dst, return_dict=True)
loss, logits = outputs.loss, outputs.logits
optimizer.zero_grad()
loss.backward()
optimizer.step()
if cnt % 1000 == 0 and gpu == 0 :
model.save_pretrained("bert2bert")
cnt = cnt + 1
if __name__ == '__main__':
main()
@LysandreJik ,@ganeshkharad2
I can save this with state_dict. But how can I load it again with from_pretrained method ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I can save this with state_dict. But how can I load it again with from_pretrained method ?
Hi, i meet the same problem, have you solved this problem? or?
I can save this with state_dict. But how can I load it again with from_pretrained method ?
Hi, Did you find any workaround for this? Thanks in advance.
Any solution for this?
try model.module.save_pretrained
❓ Questions & Help
Details
Hey, I want to use EncoderDecoderModel for parallel trainging. When I save my model, I got the following questions. How can I fix this ?
'DistributedDataParallel' object has no attribute 'save_pretrained'
A link to original question on the forum/Stack Overflow: