huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.65k stars 442 forks source link

[BUG] Training error on NER task #598

Closed Jerado10 closed 2 months ago

Jerado10 commented 2 months ago

Prerequisites

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Screenshot 2024-04-25 at 10 48 15

Error Logs

Device 0: Tesla T4 - 434.4MiB/15360MiB

INFO | 2024-04-25 00:41:37 | autotrain.app_utils:get_running_jobs:28 - Killing PID: 57 subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.token_classification', '--training_config', 'autotrain-qxa3o-7b25f/training_params.json']' returned non-zero exit status 1. raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 681, in simple_launcher simple_launcher(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1075, in launch_command args.func(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main sys.exit(main()) File "/app/env/bin/accelerate", line 8, in Traceback (most recent call last): ValueError: Loading seqeval requires you to execute the dataset script in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error. raise ValueError( File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 811, in get_module ).get_module() File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 2016, in metric_module_factory raise e1 from None File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 2022, in metric_module_factory return deprecated_function(*args, kwargs) File "/app/env/lib/python3.10/site-packages/datasets/utils/deprecation_utils.py", line 46, in wrapper metric_module = metric_module_factory( File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 2104, in load_metric return deprecated_function(args, kwargs) File "/app/env/lib/python3.10/site-packages/datasets/utils/deprecation_utils.py", line 46, in wrapper _METRICS = load_metric("seqeval") File "/app/env/lib/python3.10/site-packages/autotrain/trainers/token_classification/utils.py", line 5, in from autotrain.trainers.token_classification import utils File "/app/env/lib/python3.10/site-packages/autotrain/trainers/token_classification/main.py", line 29, in exec(code, run_globals) File "/app/env/lib/python3.10/runpy.py", line 86, in _run_code return _run_code(code, main_globals, None, File "/app/env/lib/python3.10/runpy.py", line 196, in _run_module_as_main Traceback (most recent call last): Downloading builder script: 6.33kB [00:00, 16.2MB/s] Downloading builder script: 0%| | 0.00/2.47k [00:00<?, ?B/s] _METRICS = load_metric("seqeval") /app/env/lib/python3.10/site-packages/autotrain/trainers/token_classification/utils.py:5: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate To avoid this warning pass in values for each of the problematic parameters or run accelerate config. --dynamo_backend was set to a value of 'no' The following values were not passed to accelerate launch and had defaults used instead: INFO | 2024-04-25 00:41:24 | autotrain.backend:create:297 - Training PID: 57 INFO | 2024-04-25 00:41:24 | autotrain.commands:launch_command:339 - {'data_path': 'autotrain-qxa3o-7b25f/autotrain-data', 'model': 'Babelscape/wikineural-multilingual-ner', 'lr': 5e-05, 'epochs': 3, 'max_seq_length': 128, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'tokens_column': 'autotrain_text', 'tags_column': 'autotrain_label', 'logging_steps': -1, 'project_name': 'autotrain-qxa3o-7b25f', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'save_strategy': 'epoch', 'token': '', 'push_to_hub': True, 'evaluation_strategy': 'epoch', 'username': 'Jerado', 'log': 'tensorboard'} INFO | 2024-04-25 00:41:24 | autotrain.commands:launch_command:338 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.token_classification', '--training_config', 'autotrain-qxa3o-7b25f/training_params.json'] INFO | 2024-04-25 00:41:24 | autotrain.backend:create:292 - Starting local training... WARNING | 2024-04-25 00:41:24 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: weight_decay, auto_find_batch_size, train_split, seed, warmup_ratio, evaluation_strategy, max_grad_norm, save_total_limit, logging_steps, save_strategy Saving the dataset (1/1 shards): 100%|██████████| 4/4 [00:00<00:00, 1353.44 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 4/4 [00:00<00:00, 1415.80 examples/s] Saving the dataset (0/1 shards): 0%| | 0/4 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|██████████| 16/16 [00:00<00:00, 5265.92 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 16/16 [00:00<00:00, 5517.46 examples/s] Saving the dataset (0/1 shards): 0%| | 0/16 [00:00<?, ? examples/s] Casting the dataset: 100%|██████████| 4/4 [00:00<00:00, 1030.10 examples/s] Casting the dataset: 0%| | 0/4 [00:00<?, ? examples/s] Casting the dataset: 100%|██████████| 16/16 [00:00<00:00, 1220.65 examples/s] Casting the dataset: 0%| | 0/16 [00:00<?, ? examples/s] INFO | 2024-04-25 00:41:24 | autotrain.app:handle_form:543 - Column mapping: {'text': 'tokens', 'label': 'tags'} INFO | 2024-04-25 00:41:24 | autotrain.app:handle_form:542 - Task: text_token_classification INFO | 2024-04-25 00:41:24 | autotrain.app:handle_form:453 - hardware: Local INFO | 2024-04-25 00:40:55 | autotrain.app:fetch_params:211 - Task: token-classification INFO | 2024-04-25 00:40:52 | autotrain.app:fetch_params:211 - Task: llm:sft INFO | 2024-04-25 00:40:46 | autotrain.app::153 - AutoTrain started successfully WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: train_split, seed, epochs, lr, save_total_limit, project_name, max_seq_length, tags_column, scheduler, gradient_accumulation, batch_size, weight_decay, tokens_column, auto_find_batch_size, model, username, warmup_ratio, optimizer, evaluation_strategy, token, max_grad_norm, push_to_hub, valid_split, logging_steps, data_path, save_strategy WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: class_prompt, adam_beta2, prior_generation_precision, adam_weight_decay, seed, text_encoder_use_attention_mask, image_path, project_name, sample_batch_size, local_rank, prior_loss_weight, tokenizer_max_length, scheduler, checkpoints_total_limit, model, num_cycles, adam_beta1, checkpointing_steps, username, validation_epochs, max_grad_norm, epochs, num_class_images, logging, class_labels_conditioning, adam_epsilon, prior_preservation, scale_lr, center_crop, xl, num_validation_images, pre_compute_text_embeddings, warmup_steps, tokenizer, rank, validation_images, allow_tf32, token, revision, push_to_hub, lr_power, validation_prompt, resume_from_checkpoint, class_image_path, dataloader_num_workers WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: train_split, seed, time_limit, categorical_columns, project_name, task, id_column, num_trials, model, target_columns, username, token, push_to_hub, numerical_columns, valid_split, data_path WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: lora_alpha, train_split, seed, lr, save_total_limit, project_name, lora_dropout, max_seq_length, scheduler, gradient_accumulation, weight_decay, model, username, evaluation_strategy, peft, lora_r, max_grad_norm, epochs, logging_steps, text_column, quantization, target_column, batch_size, auto_find_batch_size, warmup_ratio, optimizer, token, push_to_hub, valid_split, max_target_length, data_path WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: train_split, seed, epochs, lr, save_total_limit, target_column, project_name, batch_size, scheduler, gradient_accumulation, weight_decay, auto_find_batch_size, model, image_column, username, warmup_ratio, optimizer, evaluation_strategy, token, max_grad_norm, push_to_hub, valid_split, logging_steps, data_path, save_strategy WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: text_column, train_split, seed, epochs, lr, save_total_limit, target_column, project_name, max_seq_length, batch_size, scheduler, gradient_accumulation, weight_decay, auto_find_batch_size, model, username, warmup_ratio, optimizer, evaluation_strategy, token, max_grad_norm, push_to_hub, valid_split, logging_steps, data_path, save_strategy WARNING | 2024-04-25 00:40:45 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: lora_alpha, dpo_beta, train_split, seed, disable_gradient_checkpointing, lr, trainer, save_total_limit, lora_dropout, project_name, use_flash_attention_2, scheduler, prompt_text_column, gradient_accumulation, weight_decay, max_prompt_length, model, username, add_eos_token, evaluation_strategy, lora_r, max_grad_norm, logging_steps, text_column, rejected_text_column, batch_size, auto_find_batch_size, model_ref, merge_adapter, warmup_ratio, optimizer, token, push_to_hub, model_max_length, valid_split, data_path INFO | 2024-04-25 00:40:45 | autotrain.app::31 - Starting AutoTrain... Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGetMemoryInfo. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop. Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop.

Additional Information

I could replicate this error with 3 different data sets, including the example data test_data.csv

abhishekkrthakur commented 2 months ago

removed datasets load metric and using seqeval directly. apologies for the inconvenience. it was a late night decision to update the library.

abhishekkrthakur commented 2 months ago

fixed in 0.7.70+. factory rebuild or create new autotrain space to take effect.