sb-ai-lab / LightAutoML

Fast and customizable framework for automatic ML model creation (AutoML)
https://developers.sber.ru/portal/products/lightautoml
Apache License 2.0
1.09k stars 48 forks source link

Input contains NaN error when doing linear_l2 model #75

Closed RishatZagidullin closed 1 year ago

RishatZagidullin commented 1 year ago

🐛 Bug

On some multiclass tasks the linear model throws the following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

called from site-packages/sklearn/utils/validation.py.

To Reproduce

Steps to reproduce the behavior:

  1. Unarchive the issue.zip folder;
  2. Place it in the LightAutoML directory and cd to issue folder;
  3. run python ./lama_cpu.py -p ./data/ -k sf-crime -f 2 -n 4 -s 42 -c ./lama_cpu.yml -t 7200;
  4. during fold 2 calculation an error should appear;
  5. if you run python ./lama_cpu.py -p ./data/ -k otto -f 2 -n 4 -s 42 -c ./lama_cpu.yml -t 7200 you should see a normal program termination on a different dataset.

Expected behavior

I expect the sf-crime dataset to finish successfully just like otto.

Additional context

You can make the error disappear if you change learning rate from 0.1 to 0.05. But is it a good solution?

Checklist

issue.zip