Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 0 examples [00:00, ? examples/s]
ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1748, in _prepare_split_single
for key, record in generator:
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/packaged_modules/generator/generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 536, in data_generator
yield from constant_length_iterator
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/utils.py", line 458, in iter
buffer_len += len(buffer[-1])
TypeError: object of type 'NoneType' has no len()
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 539, in _prepare_packed_dataloader
packed_dataset = Dataset.from_generator(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1117, in from_generator
return GeneratorDatasetInputStream(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/io/generator.py", line 47, in read
self.builder.download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1027, in download_and_prepare
self._download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1789, in _download_and_prepare
super()._download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1122, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1627, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1784, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/main.py", line 28, in train
train_sft(config)
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 86, in train
trainer = SFTTrainer(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 283, in init
train_dataset = self._prepare_dataset(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 435, in _prepare_dataset
return self._prepare_packed_dataloader(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 543, in _prepare_packed_dataloader
raise ValueError(
ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
INFO | 2024-05-15 23:14:03 | autotrain.utils:get_running_jobs:57 - Killing PID: 165343
Additional Information
The txt file I am using for testing has about 300 lines and not sure if this the reason or something else.
Prerequisites
Backend
Local
Interface Used
CLI
CLI Command
autotrain app --port 8080 --host 127.0.0.1
UI Screenshots & Parameters
Error Logs
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.21s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.31s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.63s/it] INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:66 - model dtype: torch.float16 INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:79 - creating trainer
Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 0 examples [00:00, ? examples/s] ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1748, in _prepare_split_single for key, record in generator: File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/packaged_modules/generator/generator.py", line 30, in _generate_examples for idx, ex in enumerate(self.config.generator(**gen_kwargs)): File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 536, in data_generator yield from constant_length_iterator File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/utils.py", line 458, in iter buffer_len += len(buffer[-1]) TypeError: object of type 'NoneType' has no len()
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 539, in _prepare_packed_dataloader packed_dataset = Dataset.from_generator( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1117, in from_generator return GeneratorDatasetInputStream( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/io/generator.py", line 47, in read self.builder.download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1027, in download_and_prepare self._download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1789, in _download_and_prepare super()._download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1122, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1627, in _prepare_split for job_id, done, content in self._prepare_split_single( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1784, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, **kwargs) File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/main.py", line 28, in train train_sft(config) File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 86, in train trainer = SFTTrainer( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 283, in init train_dataset = self._prepare_dataset( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 435, in _prepare_dataset return self._prepare_packed_dataloader( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 543, in _prepare_packed_dataloader raise ValueError( ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence. INFO | 2024-05-15 23:14:03 | autotrain.utils:get_running_jobs:57 - Killing PID: 165343
Additional Information
The txt file I am using for testing has about 300 lines and not sure if this the reason or something else.