Aidanlochbihler commented 4 years ago

🐛 Bug

  File "C:\Users\temp\Aida\aida\agents\bertbot\Bert\bert_intent_classifier_pytorch.py", line 298, in process
    logits = self.model(prediction_inputs, token_type_ids=None, attention_mask=prediction_masks)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 897, in forward
    head_mask=head_mask)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 624, in forward
    embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 167, in forward
    words_embeddings = self.word_embeddings(input_ids)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\functional.py", line 1484, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Issue

Hi everyone when I run the line:

outputs = model(input_ids = b_input_ids, attention_mask=b_input_mask, labels=b_labels)

with model defined as,

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=numlabels)

It returns the stated error. However this only happens when I am on my windows computer. When I run the exact same code with the same python version and libraries it works perfectly fine. I have the most up to date version of pytorch (1.4) and transformers installed.

Any help would be greatly appreciated

Information

Using the latest version of pytorch and transformers Model I am using (Bert, XLNet ...): BertForSequenceClassification Language I am using the model on (English, Chinese ...): English

LysandreJik commented 4 years ago

It is weird that there is a discrepancy between Windows and Linux.

Could you try casting your variables b_input_ids, b_input_mask and b_labels to torch.long?

Are you defining some of your variables on GPU? Does it fail if everything stays on CPU?

BramVanroy commented 4 years ago

I often prototype on Windows and push to Linux for final processing and I've never had this issue. Can you post a minimal working example that I can copy-paste to test?

Aidanlochbihler commented 4 years ago

Ok update I got the error to go away but to do it I had to do some janky fixes that I don't think should be necessary

So if I cast all my variables as ex: b_labels = b_labels.type(torch.LongTensor) and I train on CPU it works (but its super slow)
If I want to train on GPU I again cast the tensors to long but then have to cast all of my tensors to GPU (.to(device)) even though I already did it

`

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=numlabels)

model.cuda()
#model = nn.DataParallel(model)

# This variable contains all of the hyperparemeter information our training loop needs
# Parameters:
lr = 2e-5
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps)  # 0.1

### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False)  # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)  # PyTorch scheduler

t = [] 

# Store our loss and accuracy for plotting
train_loss_set = []

# Number of training epochs (authors recommend between 2 and 4)
epochs = 5 #5:0.96

# trange is a tqdm wrapper around the normal python range
for _ in trange(epochs, desc="Epoch"):
    # Training
    # Set our model to training mode (as opposed to evaluation mode)
    model.train()
    # Tracking variables
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0

    # Train the data for one epoch
    for step, batch in enumerate(train_dataloader):
        # Add batch to GPU
        batch = tuple(t.to(device) for t in batch)

        # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask, b_labels = batch

        ###############Bug fix code####################
        b_input_ids = b_input_ids.type(torch.LongTensor)
        b_input_mask = b_input_mask.type(torch.LongTensor)
        b_labels = b_labels.type(torch.LongTensor)

        b_input_ids = b_input_ids.to(device)
        b_input_mask = b_input_mask.to(device)
        b_labels = b_labels.to(device)
         ############################################
        # Clear out the gradients (by default they accumulate)
        optimizer.zero_grad()

        # Forward pass
        outputs = model(input_ids = b_input_ids, attention_mask=b_input_mask, labels=b_labels)
        loss, logits = outputs[:2]

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)  # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
        optimizer.step()
        scheduler.step()

` Very strange (posted the code I thought would be useful to see let me know if you need to see more)

BramVanroy commented 4 years ago

You're doing .to(device) twice for your data (once in the tuple, once separately). It is hard to reproduce this because we don't have your data, so we don't know how you encode your data. What is example contents of batch to reproduce your issue?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

chintanckg commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Had similar issue: Young Sheldon's solution on below stackoverflow thread worked well.

https://stackoverflow.com/questions/56360644/pytorch-runtimeerror-expected-tensor-for-argument-1-indices-to-have-scalar-t

aryamansriram commented 4 years ago

Having the same issue, funny thing is the whole model worked for training, but while running inference on test data the error automatically showed up

alfawzaan commented 4 years ago

Having the same issue, funny thing is the whole model worked for training, but while running inference on test data the error automatically showed up

Exactly the same issue I am facing. I am using Amazon SageMaker notebook instance

plandes commented 3 years ago

Hi,

I'm working with transformers version 4.4.2 and getting this error when not passing in the position_ids kwarg to the model. Adding the following line in transformers/models/bert/modeling_bert.py on line 207 fixes the issue for me:

            position_ids = position_ids.to(torch.long)

Of course, you can do this work by passing in your own position_ids, but that's no fun.

doris-art commented 3 years ago

hi，I have met the same problem, just because use torch.Tensor( ),.when I check,I change it into torch.tensor,and it's OK.

plandes commented 3 years ago

@doris-art Here's my work around. Assuming params is a dict that is passed to the __call__ method of the model as **kwargs:

# a bug in transformers 4.4.2 requires this
# https://github.com/huggingface/transformers/issues/2952
input_ids = params['input_ids']
seq_length = input_ids.size()[1]
position_ids = model.embeddings.position_ids
position_ids = position_ids[:, 0: seq_length].to(torch.long)
params['position_ids'] = position_ids

neeraj1909 commented 2 years ago

I am getting the same error. I am unable to resolve it.

I am using:

Python implementation: CPython Python version : 3.7.12 IPython version : 7.29.0

numpy : 1.19.5 pandas : 1.3.4 torch : 1.9.1 transformers: 4.12.5

Any help would be greatly appreciated.

DnyaneshwarBhadane1997 commented 2 years ago

I had the same issue in the past. after checking for the many issue for this error. i did some reverse engineering and found that my input been going as empty in the modal train. If you pass the input sentence as empty then also faced the same error. I have resolved by filtering my dataset with null/empty sentence data point.

huggingface / transformers

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding) #2952

🐛 Bug

Issue

Information

I am using: