huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.11k stars 27.04k forks source link

[Efficiency] Decoding can be made faster by not converting special tokens to ids for each token. #27289

Open ganeshpatelQB opened 1 year ago

ganeshpatelQB commented 1 year ago

System Info

Who can help?

@ArthurZucker

Information

Tasks

Reproduction

The following function is being called for each token while using decoding function.


from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained(TOKENIZER_PATH)
beams = tokenizer.batch_decode(
    outputs, skip_special_tokens=True
)
  @property
  def all_special_ids(self) -> List[int]:
      """
      `List[int]`: List the ids of the special tokens(`'<unk>'`, `'<cls>'`, etc.) mapped to class attributes.
      """
      all_toks = self.all_special_tokens
      all_ids = self.convert_tokens_to_ids(all_toks)
      return all_ids

Expected behavior

all_special_ids should not be called for each token while decoding at the time of inferencing.

ArthurZucker commented 1 year ago

Very good catch! I'll open a pr for this. Affect both convert_ids_to_tokens and decode. 🤗 I need to do some benchmarking as I suspect this does won't have a huge impact but will give it a shot. I plan to benchmark our full calls to make sure we don't have things similar to this else where

ArthurZucker commented 11 months ago

My initial tests did not show any impact with NLLB and whisper which have the most amount of added tokens, but I'll try to optimize and benchmark in a near futur!