huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.63k stars 441 forks source link

Training process crashes suddenly #642

Open dejankocic opened 1 month ago

dejankocic commented 1 month ago

Prerequisites

Backend

Local

Interface Used

CLI

CLI Command

autotrain app --port 8080 --host 127.0.0.1

UI Screenshots & Parameters

image

Error Logs

Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.21s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.31s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.63s/it] INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:66 - model dtype: torch.float16 INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:79 - creating trainer

Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 0 examples [00:00, ? examples/s] ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1748, in _prepare_split_single for key, record in generator: File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/packaged_modules/generator/generator.py", line 30, in _generate_examples for idx, ex in enumerate(self.config.generator(**gen_kwargs)): File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 536, in data_generator yield from constant_length_iterator File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/utils.py", line 458, in iter buffer_len += len(buffer[-1]) TypeError: object of type 'NoneType' has no len()

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 539, in _prepare_packed_dataloader packed_dataset = Dataset.from_generator( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1117, in from_generator return GeneratorDatasetInputStream( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/io/generator.py", line 47, in read self.builder.download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1027, in download_and_prepare self._download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1789, in _download_and_prepare super()._download_and_prepare( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1122, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1627, in _prepare_split for job_id, done, content in self._prepare_split_single( File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1784, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, **kwargs) File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/main.py", line 28, in train train_sft(config) File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 86, in train trainer = SFTTrainer( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 283, in init train_dataset = self._prepare_dataset( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 435, in _prepare_dataset return self._prepare_packed_dataloader( File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 543, in _prepare_packed_dataloader raise ValueError( ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.

ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence. INFO | 2024-05-15 23:14:03 | autotrain.utils:get_running_jobs:57 - Killing PID: 165343

Additional Information

The txt file I am using for testing has about 300 lines and not sure if this the reason or something else.

hichambht32 commented 1 month ago

i guess it should be a csv file not a text file, as for the dataset columns try to respect the suggested format regading your use case

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity.