ValueError: too many values to unpack (expected 2) after running `trainer.train()`

tan-js commented 3 months ago

HI @tomaarsen,

Thanks a lot for your amazing work!

While running the trainer.train() cell in the "getting_started.ipynb", I got this error:

I'm using Python 3.11.6, torch: '2.3.0+cu121' transformers: '4.41.2'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 trainer.train()

File ~/USER/env311/lib64/python3.11/site-packages/transformers/trainer.py:1885, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1883         hf_hub_utils.enable_progress_bars()
   1884 else:
-> 1885     return inner_training_loop(
   1886         args=args,
   1887         resume_from_checkpoint=resume_from_checkpoint,
   1888         trial=trial,
   1889         ignore_keys_for_eval=ignore_keys_for_eval,
   1890     )

File ~/USER/env311/lib64/python3.11/site-packages/transformers/trainer.py:2216, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2213     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   2215 with self.accelerator.accumulate(model):
-> 2216     tr_loss_step = self.training_step(model, inputs)
   2218 if (
   2219     args.logging_nan_inf_filter
   2220     and not is_torch_xla_available()
   2221     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2222 ):
   2223     # if loss is nan or inf simply add the average of previous logged losses
   2224     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~/USER/env311/lib64/python3.11/site-packages/transformers/trainer.py:3238, in Trainer.training_step(self, model, inputs)
   3235     return loss_mb.reduce_mean().detach().to(self.args.device)
   3237 with self.compute_loss_context_manager():
-> 3238     loss = self.compute_loss(model, inputs)
   3240 del inputs
   3241 torch.cuda.empty_cache()

File ~/USER/env311/lib64/python3.11/site-packages/transformers/trainer.py:3264, in Trainer.compute_loss(self, model, inputs, return_outputs)
   3262 else:
   3263     labels = None
-> 3264 outputs = model(**inputs)
   3265 # Save past state if it exists
   3266 # TODO: this needs to be fixed and made cleaner later.
   3267 if self.args.past_index >= 0:

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
   1540         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541     return forward_call(*args, **kwargs)
   1543 try:
   1544     result = None

File ~/USER/env311/lib64/python3.11/site-packages/accelerate/utils/operations.py:822, in convert_outputs_to_fp32.<locals>.forward(*args, **kwargs)
    821 def forward(*args, **kwargs):
--> 822     return model_forward(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/accelerate/utils/operations.py:810, in ConvertOutputsToFp32.__call__(self, *args, **kwargs)
    809 def __call__(self, *args, **kwargs):
--> 810     return convert_to_fp32(self.model_forward(*args, **kwargs))

File ~/USER/env311/lib64/python3.11/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator.<locals>.decorate_autocast(*args, **kwargs)
     13 @functools.wraps(func)
     14 def decorate_autocast(*args, **kwargs):
     15     with autocast_instance:
---> 16         return func(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/span_marker/modeling.py:153, in SpanMarkerModel.forward(self, input_ids, attention_mask, position_ids, start_marker_indices, num_marker_pairs, labels, num_words, document_ids, sentence_ids, **kwargs)
    136 """Forward call of the SpanMarkerModel.
    137 
    138 Args:
   (...)
    150     SpanMarkerOutput: The output dataclass.
    151 """
    152 token_type_ids = torch.zeros_like(input_ids)
--> 153 outputs = self.encoder(
    154     input_ids,
    155     attention_mask=attention_mask,
    156     token_type_ids=token_type_ids,
    157     position_ids=position_ids,
    158 )
    159 last_hidden_state = outputs[0]
    160 last_hidden_state = self.dropout(last_hidden_state)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
   1540         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541     return forward_call(*args, **kwargs)
   1543 try:
   1544     result = None

File ~/USER/env311/lib64/python3.11/site-packages/transformers/models/bert/modeling_bert.py:1103, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   1096         extended_attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
   1097             attention_mask,
   1098             input_shape,
   1099             embedding_output,
   1100             past_key_values_length,
   1101         )
   1102     else:
-> 1103         extended_attention_mask = _prepare_4d_attention_mask_for_sdpa(
   1104             attention_mask, embedding_output.dtype, tgt_len=seq_length
   1105         )
   1106 else:
   1107     # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
   1108     # ourselves in which case we just need to make it broadcastable to all heads.
   1109     extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape)

File ~/USER/env311/lib64/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:439, in _prepare_4d_attention_mask_for_sdpa(mask, dtype, tgt_len)
    426 def _prepare_4d_attention_mask_for_sdpa(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
    427     """
    428     Creates a non-causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
    429     `(batch_size, key_value_length)`
   (...)
    437             The target length or query length the created mask shall have.
    438     """
--> 439     batch_size, key_value_length = mask.shape
    440     tgt_len = tgt_len if tgt_len is not None else key_value_length
    442     # torch.jit.trace, symbolic_trace and torchdynamo with fullgraph=True are unable to capture the controlflow `is_causal=attention_mask is None and q_len > 1`
    443     # used as an SDPA argument. We keep compatibility with these tracing tools by always using SDPA's `attn_mask` argument in case we are tracing.
    444     # TODO: For dynamo, rather use a check on fullgraph=True once this is possible (https://github.com/pytorch/pytorch/pull/120400).

The odd thing is that I could get it to run initially, but I had the same error while running it with my own dataset. After failing to get it to work using my own dataset (with the same parameters and "bert-base-cased" model), I tried to restart all my kernels and run the original unedited getting-started notebook with the default dataset. I'm getting this error now and I'm not able to get the getting-started notebook to work anymore.

I'm able to perform training and validation step (similar to the "getting-started" notebook), using the standard Huggingface transformers approach.

Thanks for your time

tomaarsen commented 3 months ago

Hello!

My first guess is that perhaps the transformers version is too high. You can try installing an older version, restarting your kernel, and trying again.

Tom Aarsen

tan-js commented 3 months ago

Hello!

My first guess is that perhaps the transformers version is too high. You can try installing an older version, restarting your kernel, and trying again.

Tom Aarsen

Hi, thanks for your reply! Which version of transformers do you recommend?

Update: I tried using pip install transformers==4.39 and it resolved this error. I didn't test newer versions. Thanks a lot!

a-j-jones commented 3 months ago

I had the same issue with model.predict() and transformers==4.39 fixed it for me

gilljon commented 2 months ago

@tomaarsen I have a fix for this but wondering what the status of the support for this project is?

xkasberg commented 1 month ago

Upgrade to Transformers 4.43 anytime soon?

tomaarsen / SpanMarkerNER

ValueError: too many values to unpack (expected 2) after running `trainer.train()` #60