ValueError: too many values to unpack (expected 2) after running `trainer.train()` #60

Open tan-js opened 3 months ago

tan-js commented 3 months ago

HI @tomaarsen,

Thanks a lot for your amazing work!

While running the trainer.train() cell in the "getting_started.ipynb", I got this error:

I'm using Python 3.11.6, torch: '2.3.0+cu121' transformers: '4.41.2'

ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 trainer.train()

File ~/USER/env311/lib64/python3.11/site-packages/transformers/, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1883         hf_hub_utils.enable_progress_bars()
   1884 else:
-> 1885     return inner_training_loop(
   1886         args=args,
   1887         resume_from_checkpoint=resume_from_checkpoint,
   1888         trial=trial,
   1889         ignore_keys_for_eval=ignore_keys_for_eval,
   1890     )

File ~/USER/env311/lib64/python3.11/site-packages/transformers/, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2213     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   2215 with self.accelerator.accumulate(model):
-> 2216     tr_loss_step = self.training_step(model, inputs)
   2218 if (
   2219     args.logging_nan_inf_filter
   2220     and not is_torch_xla_available()
   2221     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2222 ):
   2223     # if loss is nan or inf simply add the average of previous logged losses
   2224     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~/USER/env311/lib64/python3.11/site-packages/transformers/, in Trainer.training_step(self, model, inputs)
   3235     return loss_mb.reduce_mean().detach().to(self.args.device)
   3237 with self.compute_loss_context_manager():
-> 3238     loss = self.compute_loss(model, inputs)
   3240 del inputs
   3241 torch.cuda.empty_cache()

File ~/USER/env311/lib64/python3.11/site-packages/transformers/, in Trainer.compute_loss(self, model, inputs, return_outputs)
   3262 else:
   3263     labels = None
-> 3264 outputs = model(**inputs)
   3265 # Save past state if it exists
   3266 # TODO: this needs to be fixed and made cleaner later.
   3267 if self.args.past_index >= 0:

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
   1540         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541     return forward_call(*args, **kwargs)
   1543 try:
   1544     result = None

File ~/USER/env311/lib64/python3.11/site-packages/accelerate/utils/, in convert_outputs_to_fp32.<locals>.forward(*args, **kwargs)
    821 def forward(*args, **kwargs):
--> 822     return model_forward(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/accelerate/utils/, in ConvertOutputsToFp32.__call__(self, *args, **kwargs)
    809 def __call__(self, *args, **kwargs):
--> 810     return convert_to_fp32(self.model_forward(*args, **kwargs))

File ~/USER/env311/lib64/python3.11/site-packages/torch/amp/, in autocast_decorator.<locals>.decorate_autocast(*args, **kwargs)
     13 @functools.wraps(func)
     14 def decorate_autocast(*args, **kwargs):
     15     with autocast_instance:
---> 16         return func(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/span_marker/, in SpanMarkerModel.forward(self, input_ids, attention_mask, position_ids, start_marker_indices, num_marker_pairs, labels, num_words, document_ids, sentence_ids, **kwargs)
    136 """Forward call of the SpanMarkerModel.
    138 Args:
    150     SpanMarkerOutput: The output dataclass.
    151 """
    152 token_type_ids = torch.zeros_like(input_ids)
--> 153 outputs = self.encoder(
    154     input_ids,
    155     attention_mask=attention_mask,
    156     token_type_ids=token_type_ids,
    157     position_ids=position_ids,
    158 )
    159 last_hidden_state = outputs[0]
    160 last_hidden_state = self.dropout(last_hidden_state)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/USER/env311/lib64/python3.11/site-packages/torch/nn/modules/, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
   1540         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541     return forward_call(*args, **kwargs)
   1543 try:
   1544     result = None

File ~/USER/env311/lib64/python3.11/site-packages/transformers/models/bert/, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   1096         extended_attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
   1097             attention_mask,
   1098             input_shape,
   1099             embedding_output,
   1100             past_key_values_length,
   1101         )
   1102     else:
-> 1103         extended_attention_mask = _prepare_4d_attention_mask_for_sdpa(
   1104             attention_mask, embedding_output.dtype, tgt_len=seq_length
   1105         )
   1106 else:
   1107     # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
   1108     # ourselves in which case we just need to make it broadcastable to all heads.
   1109     extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape)

File ~/USER/env311/lib64/python3.11/site-packages/transformers/, in _prepare_4d_attention_mask_for_sdpa(mask, dtype, tgt_len)
    426 def _prepare_4d_attention_mask_for_sdpa(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
    427     """
    428     Creates a non-causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
    429     `(batch_size, key_value_length)`
    437             The target length or query length the created mask shall have.
    438     """
--> 439     batch_size, key_value_length = mask.shape
    440     tgt_len = tgt_len if tgt_len is not None else key_value_length
    442     # torch.jit.trace, symbolic_trace and torchdynamo with fullgraph=True are unable to capture the controlflow `is_causal=attention_mask is None and q_len > 1`
    443     # used as an SDPA argument. We keep compatibility with these tracing tools by always using SDPA's `attn_mask` argument in case we are tracing.
    444     # TODO: For dynamo, rather use a check on fullgraph=True once this is possible (

The odd thing is that I could get it to run initially, but I had the same error while running it with my own dataset. After failing to get it to work using my own dataset (with the same parameters and "bert-base-cased" model), I tried to restart all my kernels and run the original unedited getting-started notebook with the default dataset. I'm getting this error now and I'm not able to get the getting-started notebook to work anymore.

I'm able to perform training and validation step (similar to the "getting-started" notebook), using the standard Huggingface transformers approach.

Thanks for your time

tomaarsen commented 3 months ago


My first guess is that perhaps the transformers version is too high. You can try installing an older version, restarting your kernel, and trying again.

tan-js commented 3 months ago


My first guess is that perhaps the transformers version is too high. You can try installing an older version, restarting your kernel, and trying again.

  • Tom Aarsen

Hi, thanks for your reply! Which version of transformers do you recommend?

Update: I tried using pip install transformers==4.39 and it resolved this error. I didn't test newer versions. Thanks a lot!

a-j-jones commented 3 months ago

I had the same issue with model.predict() and transformers==4.39 fixed it for me

gilljon commented 2 months ago

@tomaarsen I have a fix for this but wondering what the status of the support for this project is?

xkasberg commented 1 month ago

Upgrade to Transformers 4.43 anytime soon?