huggingface / autotrain-advanced

πŸ€— AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.84k stars 472 forks source link

Issue with AutoTrain Advanced #786

Open Gladys-Toper opened 2 days ago

Gladys-Toper commented 2 days ago

Prerequisites

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Issue with AutoTrain Advanced:

Project: Fine-tuning LLM (Legacy-Ledger-Fine-Tune-v1) Base Model: meta-llama/Llama-3-1-8B-Instruct Dataset: mb7419/legal-advice-reddit (Hugging Face Hub)

Error Description:

Initial error: ValueError regarding unknown split "0.8" Subsequent error: subprocess.CalledProcessError when launching training

Key Error Details:

Command execution failed with non-zero exit status 2 Unrecognized arguments: repeated "-m autotrain.trainers.clm" Training process (PID 354) terminated unexpectedly

Additional Notes:

Accelerate configuration warnings present Issue persists after addressing initial dataset split problem

The errors suggest a potential bug in the AutoTrain backend, possibly related to command argument handling or training process initialization. This prevents the training from starting successfully.

Screenshot 2024-10-04 at 1 14 44β€―PM

Error Logs

INFO | 2024-10-04 20:09:15 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 354

INFO | 2024-10-04 20:09:15 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 354

subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']' returned non-zero exit status 2.

raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher

simple_launcher(args)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command

args.func(args)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main

sys.exit(main())

File "/app/env/bin/accelerate", line 8, in

Traceback (most recent call last):

main.py: error: unrecognized arguments: -m autotrain.trainers.clm -m autotrain.trainers.clm

usage: main.py [-h] --training_config TRAINING_CONFIG

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

--mixed_precision was set to a value of 'no'

--num_machines was set to a value of 1

--num_processes was set to a value of 0

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-10-04 20:09:10 | autotrain.backends.local:create:13 - Training PID: 354

INFO | 2024-10-04 20:09:10 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None}

INFO | 2024-10-04 20:09:10 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']

WARNING | 2024-10-04 20:09:10 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow!

INFO | 2024-10-04 20:09:10 | autotrain.backends.local:create:8 - Starting local training...

INFO | 2024-10-04 20:09:10 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui

INFO | 2024-10-04 20:08:05 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 352

INFO | 2024-10-04 20:08:05 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 352

subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']' returned non-zero exit status 2.

raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher

simple_launcher(args)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command

args.func(args)

File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main

sys.exit(main())

File "/app/env/bin/accelerate", line 8, in

Traceback (most recent call last):

main.py: error: unrecognized arguments: -m autotrain.trainers.clm

usage: main.py [-h] --training_config TRAINING_CONFIG

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

--mixed_precision was set to a value of 'no'

--num_machines was set to a value of 1

--num_processes was set to a value of 0

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-10-04 20:07:59 | autotrain.backends.local:create:13 - Training PID: 352

INFO | 2024-10-04 20:07:59 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None}

INFO | 2024-10-04 20:07:59 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']

WARNING | 2024-10-04 20:07:59 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow!

INFO | 2024-10-04 20:07:59 | autotrain.backends.local:create:8 - Starting local training...

INFO | 2024-10-04 20:07:59 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui

INFO | 2024-10-04 20:07:29 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui

INFO | 2024-10-04 20:06:30 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 343

INFO | 2024-10-04 20:06:30 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 343

ERROR | 2024-10-04 20:06:26 | autotrain.trainers.common:wrapper:121 - Unknown split "0.8". Should be one of ['train'].

ValueError: Unknown split "0.8". Should be one of ['train'].

raise ValueError(f'Unknown split "{split}". Should be one of {list(name2len)}.')

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 480, in _rel_to_abs_instr

return [_rel_to_abs_instr(rel_instr, name2len) for rel_instr in self._relative_instructions]

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 663, in

return [_rel_to_abs_instr(rel_instr, name2len) for rel_instr in self._relative_instructions]

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 663, in to_absolute

absolute_instructions = instruction.to_absolute(name2len)

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 134, in make_file_instructions

file_instructions = make_file_instructions(

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 225, in get_file_instructions

files = self.get_file_instructions(name, instructions, split_infos)

File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 252, in read

dataset_kwargs = ArrowReader(cache_dir, self.info).read(

File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1370, in _as_dataset

ds = self._as_dataset(

File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1296, in _build_single_dataset

mapped = function(data_struct)

File "/app/env/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 484, in map_nested

datasets = map_nested(

File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1266, in as_dataset

ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)

File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 2621, in load_dataset

train_data = load_dataset(

File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 351, in process_input_data

train_data, valid_data = utils.process_input_data(config)

File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 14, in train

train_sft(config)

File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 28, in train

return func(*args, **kwargs)

File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper

ERROR | 2024-10-04 20:06:26 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):

Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155703/155703 [00:00<00:00, 3644055.24 examples/s]

Generating train split: 0%| | 0/155703 [00:00<?, ? examples/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180k/180k [00:00<00:00, 1.35MB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180k/180k [00:00<00:00, 1.36MB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.83M/2.83M [00:00<00:00, 8.18MB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.83M/2.83M [00:00<00:00, 8.23MB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.84k/5.84k [00:00<00:00, 50.1kB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.84k/5.84k [00:00<00:00, 50.2kB/s]

Downloading data: 0%| | 0.00/5.84k [00:00<?, ?B/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 152k/152k [00:00<00:00, 764kB/s]

Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 152k/152k [00:00<00:00, 767kB/s]

Downloading data: 0%| | 0.00/152k [00:00<?, ?B/s]

INFO | 2024-10-04 20:06:22 | autotrain.trainers.clm.train_clm_sft:train:11 - Starting SFT training...

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

--mixed_precision was set to a value of 'no'

--num_machines was set to a value of 1

--num_processes was set to a value of 0

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-10-04 20:06:17 | autotrain.backends.local:create:13 - Training PID: 343

INFO | 2024-10-04 20:06:17 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None}

INFO | 2024-10-04 20:06:17 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']

WARNING | 2024-10-04 20:06:17 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow!

INFO | 2024-10-04 20:06:17 | autotrain.backends.local:create:8 - Starting local training...

INFO | 2024-10-04 20:06:17 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui

INFO | 2024-10-04 20:05:25 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS

INFO | 2024-10-04 20:05:25 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft

INFO: 10.203.0.7:58006 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMzIzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODcyMywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.f01ZyphHQ-H1f0hqNduC6oLd-R1_oqQqrJOE4IGNDVV1ESg_RZRun38N4lLoWEPOn91dfw6jvrJg_Ilg42nlAA HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.4.41:54838 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMzIzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODcyMywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.f01ZyphHQ-H1f0hqNduC6oLd-R1_oqQqrJOE4IGNDVV1ESg_RZRun38N4lLoWEPOn91dfw6jvrJg_Ilg42nlAA HTTP/1.1" 307 Temporary Redirect

INFO | 2024-10-04 20:02:27 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS

INFO | 2024-10-04 20:02:27 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft

INFO: 10.203.0.7:39860 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTQ2LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODU0NiwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.mmzj3_JYvkG4SsyTvlNl9oxBl8jfpuK0wlbQMjZeBfjEK28ijS6sSYk6hPH9PS3oKIfdqJSof2omk9KDpPrCDg HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.4.41:52548 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTQ2LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODU0NiwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.mmzj3_JYvkG4SsyTvlNl9oxBl8jfpuK0wlbQMjZeBfjEK28ijS6sSYk6hPH9PS3oKIfdqJSof2omk9KDpPrCDg HTTP/1.1" 307 Temporary Redirect

INFO | 2024-10-04 20:01:49 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS

INFO | 2024-10-04 20:01:49 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft

INFO: 10.203.0.7:53992 - "GET / HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.0.7:53992 - "GET /auth?code=pCIvjXNYbHDvcDZN&state=rLaqKVN6FpwF0ETcx5Ga0HWfrCqb1t HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.0.7:53992 - "GET /login/huggingface?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 302 Found

ERROR | 2024-10-04 20:01:45 | autotrain.app.ui_routes:load_index:347 - Failed to get user and orgs: object of type '_TemplateResponse' has no len()

INFO: 10.203.0.7:53992 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.4.41:36212 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 307 Temporary Redirect

ERROR | 2024-10-04 02:03:05 | autotrain.app.ui_routes:user_authentication:324 - Failed to verify token: Invalid token (/oauth/userinfo). Please login with a write token.

ERROR | 2024-10-04 02:03:05 | autotrain.app.utils:token_verification:84 - Failed to request /oauth/userinfo - 504

ERROR | 2024-10-04 02:03:05 | autotrain.app.ui_routes:user_authentication:324 - Failed to verify token: Invalid token (/oauth/userinfo). Please login with a write token.

ERROR | 2024-10-04 02:03:05 | autotrain.app.utils:token_verification:84 - Failed to request /oauth/userinfo - 504

INFO | 2024-10-04 01:42:36 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS

INFO | 2024-10-04 01:42:36 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft

INFO: 10.203.14.122:50602 - "GET /?logs=build&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDA2MTUzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODA5MjU1MywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.dnUKIVelGNJw5CMS1JkW3wgFATQbdXScm-1aELkG_U5msHucvhHlC4R7445LYEB_HWfk1M0LXep7fk1A8mWhDg HTTP/1.1" 307 Temporary Redirect

INFO: 10.203.14.157:50772 - "GET /?logs=build&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDA2MTUzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODA5MjU1MywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.dnUKIVelGNJw5CMS1JkW3wgFATQbdXScm-1aELkG_U5msHucvhHlC4R7445LYEB_HWfk1M0LXep7fk1A8mWhDg HTTP/1.1" 307 Temporary Redirect

INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)

INFO: Application startup complete.

INFO: Waiting for application startup.

INFO: Started server process [121]

INFO | 2024-10-04 01:42:32 | autotrain.app.app::24 - AutoTrain started successfully

INFO | 2024-10-04 01:42:32 | autotrain.app.app::23 - AutoTrain version: 0.8.21

INFO | 2024-10-04 01:42:32 | autotrain.app.app::13 - Starting AutoTrain...

INFO | 2024-10-04 01:42:32 | autotrain.app.ui_routes::298 - AutoTrain started successfully

INFO | 2024-10-04 01:42:27 | autotrain.app.ui_routes::32 - Starting AutoTrain...

Additional Information

No response

wangzizhe commented 1 day ago

This is annoying without any instruction of this "train/valid split" on Hugging Face. But it does not mean a number, like we always understand.

If you look at the dataset https://huggingface.co/datasets/mb7419/legal-advice-reddit there are 3 splits. So in the user interface, under "train split" you should type in "train" and under "valid split" you should type in "validation". Then the trainining set and validation set will be uploaded.

abhishekkrthakur commented 1 day ago

thats correct. here are the docs: https://hf.co/docs/autotrain where its clearly mentioned :)