I came across a side effect of RMSE metric - it could not be calculated for negative values. And as a result a whole training pipeline fails.
The issues can be reproduced on the "Used Card Price" dataset and the script below. I'm pretty sure that target values do not have any negative values, therefore the error arises because of negative predictions of linear models.
It would be nice to catch and safely eliminate this problem.
[11:54:51] Stdout logging level is DEBUG.
[11:54:51] Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer
[11:54:51] Task: reg
[11:54:51] Start automl preset with listed constraints:
[11:54:51] - time: 10000000000000.00 seconds
[11:54:51] - CPU: 4 cores
[11:54:51] - memory: 16 GB
[11:54:51] Train data shape: (6019, 14)
[11:54:57] Feats was rejected during automatic roles guess: []
[11:54:57] Layer 1 train process start. Time left 9999999999993.55 secs
[11:54:58] Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...
[11:54:58] Training params: {'tol': 1e-06, 'max_iter': 100, 'cs': [1e-05, 5e-05, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000], 'early_stopping': 2, 'categorical_idx': [20, 21, 22], 'embed_sizes': array([11, 12, 3], dtype=int32), 'data_size': 23}
[11:54:58] ===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====
[11:54:58] Linear model: C = 1e-05 score = -0.5098838547322487
[11:54:58] Linear model: C = 5e-05 score = -0.41608116270993006
[11:54:58] Model Lvl_0_Pipe_0_Mod_0_LinearL2 failed during ml_algo.fit_predict call.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Traceback (most recent call last):
File "/Users/user/projects/LAML_dev/work/RMSLE_issue.py", line 21, in <module>
main()
File "/Users/user/projects/LAML_dev/work/RMSLE_issue.py", line 17, in main
automl.fit_predict(df, roles=roles, verbose=5)
File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/presets/tabular_presets.py", line 525, in fit_predict
train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose
File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/presets/base.py", line 211, in fit_predict
verbose,
File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/automl/base.py", line 225, in fit_predict
pipe_pred = ml_pipe.fit_predict(train_valid)
File "/Users/user/projects/LAML_dev/LightAutoML/lightautoml/pipelines/ml/base.py", line 150, in fit_predict
), "Pipeline finished with 0 models for some reason.\nProbably one or more models failed"
AssertionError: Pipeline finished with 0 models for some reason.
Probably one or more models failed
Process finished with exit code 1
To fix the problem you can just change the loss to the appropriate 'rmsle' instead of the default 'mse'. If it still doesn't work, please re-open the issue.
I came across a side effect of RMSE metric - it could not be calculated for negative values. And as a result a whole training pipeline fails.
The issues can be reproduced on the "Used Card Price" dataset and the script below. I'm pretty sure that target values do not have any negative values, therefore the error arises because of negative predictions of linear models. It would be nice to catch and safely eliminate this problem.
Stack trace is