Trying to enable padding, you get a wall of errors, and while I absolutely recognize this is an Olive issue it's far less likely to get traction from an end user directly to Olive than from a bulk consumer like ai-studio.
Basically, to simplify solving for:
line 45, in pre_process
return self.config.pre_process(dataset, **self.config.pre_process_params)
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/olive/data/component/pre_process_data.py", line 180, in text_generation_huggingface_pre_process
return text_gen_corpus_pre_process(dataset, tokenizer, all_kwargs)
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/olive/data/component/text_generation.py", line 342, in text_gen_corpus_pre_process
batched_input_ids = batch_tokenize_text(text_list, tokenizer, args)
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/olive/data/component/text_generation.py", line 522, in batch_tokenize_text
batched_encodings = tokenizer(
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2790, in __call__
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2876, in _call_one
return self.batch_encode_plus(
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3058, in batch_encode_plus
padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
File "/home/oliver/miniconda3/envs/mistral-7b-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2695, in _get_padding_truncation_strategies
raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.
Trying to enable padding, you get a wall of errors, and while I absolutely recognize this is an Olive issue it's far less likely to get traction from an end user directly to Olive than from a bulk consumer like ai-studio.
Basically, to simplify solving for:
allowing