Open ChitandaErumanga opened 4 days ago
when i tryed to use inputs_embeds, ive used both input_ids and inputs_embeds while setting
transformer_outputs = self.model(
None,#input_ids
attention_mask=attention_mask,
position_ids=position_ids,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
in the forward method of class LlamaForSequenceClassification
input_embeds not checking pad token
if self.config.pad_token_id is None:
sequence_lengths = -1
else:
if input_ids is not None:
# if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
sequence_lengths = sequence_lengths % input_ids.shape[-1]
sequence_lengths = sequence_lengths.to(logits.device)
else:
sequence_lengths = -1
According to the doc string of Llama's modeling file, checking if there's pad token embeds in input_embeds
is not implemented due to padding token embed is unknown at this point.
Since it cannot guess the padding tokens when
inputs_embeds
are passed instead ofinput_ids
, it does the same (take the last value in each row of the batch).
However, I was wondering if it's possible to implement a comparison between the provided input_embeds
and the embedding for the pad token (retrieved via pad_token_id
), rather than relying on simply using the last value in each row. This would allow the model to explicitly identify pad token embeddings, even when input_embeds
are used.
cc @ArthurZucker maybe
System Info
transformers 4.44.0
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
https://github.com/huggingface/transformers/blob/3f06f95ebe617b192251ef756518690f5bc7ff76/src/transformers/models/llama/modeling_llama.py#L1314 sequence_lengths is only related to input_ids, when we use inputs_embeds instead, it will be default -1 however, the forward method of LlamaModel doesnt support the input of both input_ids and inputs_embeds