huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.92k stars 471 forks source link

Classification task - ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets #430

Closed mikeagz closed 10 months ago

mikeagz commented 10 months ago

Prerequisites

Backend

Local

Interface Used

UI

CLI Command

autotrain app

UI Screenshots & Parameters

image

Error Logs

(env_autotrain)  migue@DESKTOP-KACT01N   ~    autotrain app
⚠️ WARNING | 2023-12-21 12:34:45 | autotrain.cli.run_dreambooth:<module>:14 - ❌ Some DreamBooth components are missing! Please run `autotrain setup` to install it. Ignore this warning if you are not using DreamBooth or running `autotrain setup` already.
> INFO    Authenticating user...
> WARNING Parameters not supplied by user and set to default: repo_id, save_strategy, lora_alpha, seed, gradient_accumulation, model, warmup_ratio, lr, rejected_text_column, model_max_length, dpo_beta, valid_split, project_name, logging_steps, username, batch_size, weight_decay, lora_dropout, save_total_limit, scheduler, lora_r, trainer, use_flash_attention_2, auto_find_batch_size, token, push_to_hub, evaluation_strategy, train_split, optimizer, data_path, disable_gradient_checkpointing, max_grad_norm, model_ref, add_eos_token, text_column, merge_adapter, prompt_text_column
> WARNING Parameters not supplied by user and set to default: scheduler, repo_id, save_strategy, seed, epochs, gradient_accumulation, model, warmup_ratio, auto_find_batch_size, token, log, push_to_hub, lr, evaluation_strategy, train_split, optimizer, valid_split, logging_steps, project_name, data_path, username, batch_size, weight_decay, max_grad_norm, max_seq_length, text_column, save_total_limit, target_column
> WARNING Parameters not supplied by user and set to default: scheduler, repo_id, save_strategy, seed, epochs, gradient_accumulation, model, warmup_ratio, auto_find_batch_size, token, image_column, log, push_to_hub, lr, evaluation_strategy, train_split, optimizer, valid_split, logging_steps, project_name, data_path, batch_size, weight_decay, max_grad_norm, save_total_limit, target_column
> WARNING Parameters not supplied by user and set to default: repo_id, save_strategy, lora_alpha, seed, epochs, gradient_accumulation, model, warmup_ratio, lr, peft, valid_split, project_name, logging_steps, target_modules, username, batch_size, weight_decay, lora_dropout, max_seq_length, save_total_limit, scheduler, lora_r, auto_find_batch_size, token, quantization, max_target_length, push_to_hub, evaluation_strategy, train_split, optimizer, data_path, max_grad_norm, text_column, target_column
> WARNING Parameters not supplied by user and set to default: target_columns, repo_id, seed, model, task, token, numerical_columns, push_to_hub, time_limit, train_split, valid_split, project_name, data_path, num_trials, username, categorical_columns, id_column
> WARNING Parameters not supplied by user and set to default: prior_generation_precision, adam_epsilon, repo_id, seed, tokenizer, epochs, tokenizer_max_length, model, adam_weight_decay, warmup_steps, resume_from_checkpoint, class_prompt, class_labels_conditioning, num_cycles, scale_lr, checkpoints_total_limit, prior_loss_weight, bf16, project_name, num_class_images, username, xl, checkpointing_steps, adam_beta1, scheduler, dataloader_num_workers, sample_batch_size, logging, token, center_crop, num_validation_images, validation_prompt, pre_compute_text_embeddings, push_to_hub, adam_beta2, revision, text_encoder_use_attention_mask, allow_tf32, validation_epochs, validation_images, lr_power, max_grad_norm, class_image_path, image_path, local_rank, prior_preservation, rank
INFO:     Started server process [8036]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:7860 (Press CTRL+C to quit)
INFO:     127.0.0.1:52548 - "GET / HTTP/1.1" 200 OK
> INFO    Task: llm:sft
INFO:     127.0.0.1:52548 - "GET /params/llm%3Asft HTTP/1.1" 200 OK
INFO:     127.0.0.1:52549 - "GET /model_choices/llm%3Asft HTTP/1.1" 200 OK
> INFO    Task: tabular:classification
INFO:     127.0.0.1:52549 - "GET /params/tabular%3Aclassification HTTP/1.1" 200 OK
INFO:     127.0.0.1:52548 - "GET /model_choices/tabular%3Aclassification HTTP/1.1" 200 OK
> INFO    hardware: Local
> INFO    Running jobs: []
> INFO    Dataset: rjav-s2zq-sief (tabular_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x00000118300A0A30>]
Valid data: []
Column mapping: {'id': 'id', 'label': ['target']}

Saving the dataset (1/1 shards): 100%|██████████████████████████████████| 3289/3289 [00:00<00:00, 199198.10 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████| 823/823 [00:00<00:00, 50999.66 examples/s]
> INFO    [{"seed":42,"categorical_columns":null,"numerical_columns":null,"num_trials":10,"time_limit":600,"categorical_imputer":"most_frequent","numerical_imputer":"median","numeric_scaler":"robust","model_choice":"random_forest","param_choice":"manual","backend":"Local"}]
> WARNING Parameters not supplied by user and set to default: model, train_split
> WARNING Parameters supplied but not used: param_choice, model_choice, backend
> INFO    Creating Space for job: 0
> INFO    Using params: {'data_path': 'autotrain-data-rjav-s2zq-sief', 'model': 'xgboost', 'username': 'MexicanVanGogh', 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'project_name': 'rjav-s2zq-sief-0', 'token': '*****', 'push_to_hub': True, 'id_column': 'autotrain_id', 'target_columns': ['autotrain_label'], 'repo_id': 'MexicanVanGogh/rjav-s2zq-sief-0', 'categorical_columns': None, 'numerical_columns': None, 'task': 'classification', 'num_trials': 10, 'time_limit': 600, 'categorical_imputer': 'most_frequent', 'numerical_imputer': 'median', 'numeric_scaler': 'robust'}
> INFO    Starting server
> INFO    {"data_path":"autotrain-data-rjav-s2zq-sief","model":"xgboost","username":"MexicanVanGogh","seed":42,"train_split":"train","valid_split":"validation","project_name":"rjav-s2zq-sief-0","token":"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","push_to_hub":true,"id_column":"autotrain_id","target_columns":["autotrain_label"],"repo_id":"MexicanVanGogh/rjav-s2zq-sief-0","categorical_columns":null,"numerical_columns":null,"task":"classification","num_trials":10,"time_limit":600,"categorical_imputer":"most_frequent","numerical_imputer":"median","numeric_scaler":"robust"}
> INFO    ['python', '-m', 'autotrain.trainers.tabular', '--training_config', 'output\\rjav-s2zq-sief-0\\training_params.json']
> INFO    Space created with id: 6760
INFO:     127.0.0.1:52554 - "POST /create_project HTTP/1.1" 200 OK
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:136 - Starting training...
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:137 - Training config: {'data_path': 'autotrain-data-rjav-s2zq-sief', 'model': 'xgboost', 'username': 'MexicanVanGogh', 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'project_name': 'output\\rjav-s2zq-sief-0', 'token': '*****', 'push_to_hub': True, 'id_column': 'autotrain_id', 'target_columns': ['autotrain_label'], 'repo_id': 'MexicanVanGogh/rjav-s2zq-sief-0', 'categorical_columns': None, 'numerical_columns': None, 'task': 'classification', 'num_trials': 10, 'time_limit': 600, 'categorical_imputer': 'most_frequent', 'numerical_imputer': 'median', 'numeric_scaler': 'robust'}
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:147 - loading dataset from disk
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:164 - loading dataset from disk
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:191 - Categorical columns: ['GENERO', 'ECIVIL', 'FPAGO', 'HIPOTECA']
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:192 - Numerical columns: ['EDAD', 'INGRE', 'HIJOS', 'NUMTDC', 'TIPCRED', 'CREDITOS']
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:199 - Useful columns: ['GENERO', 'ECIVIL', 'FPAGO', 'HIPOTECA', 'EDAD', 'INGRE', 'HIJOS', 'NUMTDC', 'TIPCRED', 'CREDITOS']
🚀 INFO   | 2023-12-21 12:35:34 | __main__:train:261 - Preprocessor: ColumnTransformer(n_jobs=-1,
                  transformers=[('numeric',
                                 Pipeline(steps=[('num_imputer',
                                                  SimpleImputer(strategy='median')),
                                                 ('num_scaler',
                                                  RobustScaler())]),
                                 ['EDAD', 'INGRE', 'HIJOS', 'NUMTDC', 'TIPCRED',
                                  'CREDITOS']),
                                ('categorical',
                                 Pipeline(steps=[('cat_imputer',
                                                  SimpleImputer(strategy='most_frequent')),
                                                 ('cat_encoder',
                                                  OrdinalEncoder(handle_unknown='use_encoded_value',
                                                                 unknown_value=nan))]),
                                 ['GENERO', 'ECIVIL', 'FPAGO', 'HIPOTECA'])],
                  verbose=True)
[I 2023-12-21 12:35:34,858] A new study created in memory with name: AutoTrain
[ColumnTransformer] ....... (1 of 2) Processing numeric, total=   0.0s
[ColumnTransformer] ... (2 of 2) Processing categorical, total=   0.0s
[W 2023-12-21 12:35:42,597] Trial 0 failed with parameters: {'learning_rate': 0.03499283246204768, 'reg_lambda': 1.799663602464377e-08, 'reg_alpha': 6.9106727871709355, 'subsample': 0.14825488183470453, 'colsample_bytree': 0.9629982430083197, 'max_depth': 3, 'early_stopping_rounds': 417, 'n_estimators': 7000} because of the following error: ValueError("Classification metrics can't handle a mix of multiclass and continuous-multioutput targets").
Traceback (most recent call last):
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\tabular\__main__.py", line 116, in optimize
    metric_dict = metrics.calculate(yvalid, ypred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\tabular\utils.py", line 165, in calculate
    metrics[metric_name] = metric_func(y_true, y_pred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\utils\_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\metrics\_classification.py", line 220, in accuracy_score    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\metrics\_classification.py", line 93, in _check_targets
    raise ValueError(
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets
[W 2023-12-21 12:35:42,691] Trial 0 failed with value None.
❌ ERROR  | 2023-12-21 12:35:42 | autotrain.trainers.common:wrapper:79 - train has failed due to an exception: Traceback (most recent call last):
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\common.py", line 76, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\tabular\__main__.py", line 299, in train
    study.optimize(optimize_func, n_trials=config.num_trials, timeout=config.time_limit)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\study.py", line 442, in optimize
    _optimize(
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\_optimize.py", line 251, in _run_trial
    raise func_err
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\optuna\study\_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\tabular\__main__.py", line 116, in optimize
    metric_dict = metrics.calculate(yvalid, ypred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\autotrain\trainers\tabular\utils.py", line 165, in calculate
    metrics[metric_name] = metric_func(y_true, y_pred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\utils\_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\metrics\_classification.py", line 220, in accuracy_score    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\migue\AppData\Local\miniconda3\envs\env_autotrain\lib\site-packages\sklearn\metrics\_classification.py", line 93, in _check_targets
    raise ValueError(
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

❌ ERROR  | 2023-12-21 12:35:42 | autotrain.trainers.common:wrapper:80 - Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

Additional Information

Sample of my data:

id,EDAD,INGRE,GENERO,ECIVIL,HIJOS,NUMTDC,FPAGO,HIPOTECA,TIPCRED,CREDITOS,target 0,44.0,59944.0,m,Casado,1,2,Mensual,s,1,0,2

target is: {0,1,2}

abhishekkrthakur commented 10 months ago

is the target column single valued or a list?

mikeagz commented 10 months ago

Is single valued

image

abhishekkrthakur commented 10 months ago

thanks for reporting this. it was a stupid typo bug. fixed please update your autotrain installation :)