sberbank-ai-lab / LightAutoML

LAMA - automatic model creation framework
Apache License 2.0
887 stars 92 forks source link

RMSLE metric issue #94

Closed PhySci closed 2 years ago

PhySci commented 2 years ago

I came across a side effect of RMSE metric - it could not be calculated for negative values. And as a result a whole training pipeline fails.

The issues can be reproduced on the "Used Card Price" dataset and the script below. I'm pretty sure that target values do not have any negative values, therefore the error arises because of negative predictions of linear models. It would be nice to catch and safely eliminate this problem.

import pandas as pd
from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

def main():
    df = pd.read_csv('../data/car-price-train.csv')

    roles = {'target': 'Price',
             'drop': ['Year']
             }

    task = Task('reg', metric="rmsle")

    automl = TabularAutoML(task=task, gpu_ids='', timeout=10000000000000)

    automl.fit_predict(df, roles=roles, verbose=5)

if __name__ == '__main__':
    main()

Stack trace is

[11:54:51] Stdout logging level is DEBUG.
[11:54:51] Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer
[11:54:51] Task: reg

[11:54:51] Start automl preset with listed constraints:
[11:54:51] - time: 10000000000000.00 seconds
[11:54:51] - CPU: 4 cores
[11:54:51] - memory: 16 GB

[11:54:51] Train data shape: (6019, 14)

[11:54:57] Feats was rejected during automatic roles guess: []
[11:54:57] Layer 1 train process start. Time left 9999999999993.55 secs
[11:54:58] Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...
[11:54:58] Training params: {'tol': 1e-06, 'max_iter': 100, 'cs': [1e-05, 5e-05, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000], 'early_stopping': 2, 'categorical_idx': [20, 21, 22], 'embed_sizes': array([11, 12,  3], dtype=int32), 'data_size': 23}
[11:54:58] ===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====
[11:54:58] Linear model: C = 1e-05 score = -0.5098838547322487
[11:54:58] Linear model: C = 5e-05 score = -0.41608116270993006
[11:54:58] Model Lvl_0_Pipe_0_Mod_0_LinearL2 failed during ml_algo.fit_predict call.

Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Traceback (most recent call last):
  File "/Users/user/projects/LAML_dev/work/RMSLE_issue.py", line 21, in <module>
    main()
  File "/Users/user/projects/LAML_dev/work/RMSLE_issue.py", line 17, in main
    automl.fit_predict(df, roles=roles, verbose=5)
  File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/presets/tabular_presets.py", line 525, in fit_predict
    train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose
  File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/presets/base.py", line 211, in fit_predict
    verbose,
  File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/base.py", line 225, in fit_predict
    pipe_pred = ml_pipe.fit_predict(train_valid)
  File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/pipelines/ml/base.py", line 150, in fit_predict
    ), "Pipeline finished with 0 models for some reason.\nProbably one or more models failed"
AssertionError: Pipeline finished with 0 models for some reason.
Probably one or more models failed

Process finished with exit code 1
alexmryzhkov commented 2 years ago

To fix the problem you can just change the loss to the appropriate 'rmsle' instead of the default 'mse'. If it still doesn't work, please re-open the issue.