Open mskim94 opened 2 years ago
which nlpaug version and transformers version are you using? My transformers version is 4.16.2
it is caused by latest transformers tokenizer often returns more than 2 types of outputs(eg. input_ids, token_type_ids, attention_mask)
how about update it? @mskim94 first of all, you can fix the method translate_one_step_batched easily like this :
import types
from torch.utils import data as t_data
import torch
def translate_one_step_batched(
self, data, tokenizer, model
):
tokenized_texts = tokenizer(data, padding=True, truncation=True, return_tensors='pt')
tokenized_dataset = t_data.TensorDataset(*(tokenized_texts.values()))
tokenized_dataloader = t_data.DataLoader(
tokenized_dataset,
batch_size=self.batch_size,
shuffle=False,
num_workers=1
)
all_translated_ids = []
with torch.no_grad():
for batch in tokenized_dataloader:
batch = tuple(t.to(self.device) for t in batch)
input_ids = batch[0]
attention_mask = batch[2]
translated_ids_batch = model.generate(
input_ids = batch[0]
attention_mask = batch[2]
max_length=self.max_length
)
all_translated_ids.append(
translated_ids_batch.detach().cpu().numpy()
)
all_translated_texts = []
for translated_ids_batch in all_translated_ids:
translated_texts = tokenizer.batch_decode(
translated_ids_batch,
skip_special_tokens=True
)
all_translated_texts.extend(translated_texts)
return all_translated_texts
backtranslation.model.translate_one_step_batched = types.MethodType(translate_one_step_batched, backtranslation.model)
you have to make sure that the tokenizer you use returns (input_ids, *something, attention_mask
fix something: translated_ids_batch = model.generate( input_ids=input_ids, attention_mask=attention_mask, max_length=self.max_length )
When I input the following code:
I got the following error:
How can I solve this problem?