sb-ai-lab / LightAutoML

Fast and customizable framework for automatic ML model creation (AutoML)
https://developers.sber.ru/portal/products/lightautoml
Apache License 2.0
1.08k stars 47 forks source link

from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML returns error #147

Open numomcmc opened 6 months ago

numomcmc commented 6 months ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. In Kaggle notebook,
  2. Run !pip install -U LightAutoML
  3. Minor installation error msg.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.6 which is incompatible. tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.21.6 which is incompatible. pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.2 which is incompatible. matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.17.3 which is incompatible. kornia 0.5.5 requires numpy<=1.19, but you have numpy 1.21.6 which is incompatible. imbalanced-learn 0.8.0 requires scikit-learn>=0.24, but you have scikit-learn 0.23.2 which is incompatible. Successfully installed CairoSVG-2.7.1 LightAutoML-0.3.7.3 Pyphen-0.14.0 StrEnum-0.4.15 alabaster-0.7.13 autowoe-1.3.2 cairocffi-1.6.1 catboost-1.2.2 cssselect2-0.7.0 dask-2022.2.0 distributed-2022.2.0 efficientnet-pytorch-0.7.1 featuretools-1.11.1 holidays-0.27.1 imagesize-1.4.1 importlib-metadata-1.7.0 json2html-1.3.0 numpy-1.21.6 opencv-python-4.5.2.52 pandas-1.3.5 poetry-core-1.6.1 snowballstemmer-2.2.0 sphinx-4.3.2 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 tinycss2-1.2.1 tqdm-4.66.1 weasyprint-52.5 woodwork-0.16.4 WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv

  1. from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML

TypeError Traceback (most recent call last)

in 11 12 # LightAutoML presets, task and report generation ---> 13 from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML 14 from lightautoml.tasks import Task 15 from lightautoml.report.report_deco import ReportDeco /opt/conda/lib/python3.7/site-packages/lightautoml/automl/presets/tabular_presets.py in 21 from tqdm import tqdm 22 ---> 23 from ...addons.utilization import TimeUtilization 24 from ...dataset.np_pd_dataset import NumpyDataset 25 from ...ml_algo.boost_cb import BoostCB /opt/conda/lib/python3.7/site-packages/lightautoml/addons/utilization/__init__.py in 1 """Tools to configure resources utilization.""" ----> 2 from .utilization import TimeUtilization 3 4 5 __all__ = ["TimeUtilization"] /opt/conda/lib/python3.7/site-packages/lightautoml/addons/utilization/utilization.py in 11 from typing import Union 12 ---> 13 from ...automl.base import AutoML 14 from ...automl.blend import BestModelSelector 15 from ...automl.blend import Blender /opt/conda/lib/python3.7/site-packages/lightautoml/automl/base.py in 10 from typing import Sequence 11 ---> 12 from ..dataset.base import LAMLDataset 13 from ..dataset.utils import concatenate 14 from ..pipelines.ml.base import MLPipeline /opt/conda/lib/python3.7/site-packages/lightautoml/dataset/base.py in 10 from typing import Union 11 ---> 12 from ..tasks.base import Task 13 from .roles import ColumnRole 14 /opt/conda/lib/python3.7/site-packages/lightautoml/tasks/__init__.py in 1 """Define the task to solve its loss, metric.""" 2 ----> 3 from .base import Task 4 5 /opt/conda/lib/python3.7/site-packages/lightautoml/tasks/base.py in 16 from .common_metric import _valid_metric_args 17 from .common_metric import _valid_str_metric_names ---> 18 from .losses import CBLoss 19 from .losses import LGBLoss 20 from .losses import SKLoss /opt/conda/lib/python3.7/site-packages/lightautoml/tasks/losses/__init__.py in 3 from .base import _valid_str_metric_names 4 from .cb import CBLoss ----> 5 from .lgb import LGBLoss 6 from .sklearn import SKLoss 7 from .torch import TORCHLoss /opt/conda/lib/python3.7/site-packages/lightautoml/tasks/losses/lgb.py in 10 from typing import Union 11 ---> 12 import lightgbm as lgb 13 import numpy as np 14 /opt/conda/lib/python3.7/site-packages/lightgbm/__init__.py in 6 import os 7 ----> 8 from .basic import Booster, Dataset, register_logger 9 from .callback import early_stopping, print_evaluation, record_evaluation, reset_parameter 10 from .engine import CVBooster, cv, train /opt/conda/lib/python3.7/site-packages/lightgbm/basic.py in 15 import scipy.sparse 16 ---> 17 from .compat import PANDAS_INSTALLED, concat, dt_DataTable, is_dtype_sparse, pd_DataFrame, pd_Series 18 from .libpath import find_lib_path 19 /opt/conda/lib/python3.7/site-packages/lightgbm/compat.py in 113 from dask import delayed 114 from dask.array import Array as dask_Array --> 115 from dask.dataframe import DataFrame as dask_DataFrame 116 from dask.dataframe import Series as dask_Series 117 from dask.distributed import Client, default_client, wait /opt/conda/lib/python3.7/site-packages/dask/dataframe/__init__.py in 1 try: 2 from ..base import compute ----> 3 from . import backends, dispatch, rolling 4 from .core import ( 5 DataFrame, /opt/conda/lib/python3.7/site-packages/dask/dataframe/backends.py in 21 22 from ..utils import is_arraylike, typename ---> 23 from .core import DataFrame, Index, Scalar, Series, _Frame 24 from .dispatch import ( 25 categorical_dtype_dispatch, /opt/conda/lib/python3.7/site-packages/dask/dataframe/core.py in 79 no_default = "__no_default__" 80 ---> 81 pd.set_option("compute.use_numexpr", False) 82 83 /opt/conda/lib/python3.7/site-packages/pandas/_config/config.py in __call__(self, *args, **kwds) 231 # class below which wraps functions inside a callable, and converts 232 # __doc__ into a property function. The doctsrings below are templates --> 233 # using the py2.6+ advanced formatting syntax to plug in a concise list 234 # of options, and option descriptions. 235 /opt/conda/lib/python3.7/site-packages/pandas/_config/config.py in _set_option(*args, **kwargs) 139 if o and o.validator: 140 o.validator(v) --> 141 142 # walk the nested dict 143 root, k = _get_root(key) /opt/conda/lib/python3.7/site-packages/pandas/core/config_init.py in use_numexpr_cb(key) 48 49 ---> 50 def use_numexpr_cb(key): 51 from pandas.core.computation import expressions 52 /opt/conda/lib/python3.7/site-packages/pandas/core/computation/expressions.py in 17 from pandas._typing import FuncType 18 ---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED 20 from pandas.core.ops import roperator 21 /opt/conda/lib/python3.7/site-packages/pandas/core/computation/check.py in 1 from pandas.compat._optional import import_optional_dependency 2 ----> 3 ne = import_optional_dependency("numexpr", errors="warn") 4 NUMEXPR_INSTALLED = ne is not None 5 if NUMEXPR_INSTALLED: TypeError: import_optional_dependency() got an unexpected keyword argument 'errors' You can duplicate this behavior by running the notebook [here](https://www.kaggle.com/code/studiocardo/aug21-lightautoml-starter-9b727c/edit) ### Expected behavior Installation with compatibility via pip is not unusual. Maybe introduce a conda installation procedure? `from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML` shouldn't return errors... ### Additional context ### Checklist - [ ] bug description - [ ] steps to reproduce - [ ] expected behavior - [ ] code sample / screenshots
BELONOVSKII commented 5 months ago

Hello @numomcmc! The problem is that pip installs an old version of LightAutoML (<0.3.0). By default, kaggle uses python3.10 and the latest stable versions of LightAutoML are not seen for pip3.10. We are working on this.

A temporary solution is to install LightAutoML from the lightautoml-0.3.8-py3-none-any.whl file provided by @alexmryzhkov. See the notebook for an example.

numomcmc commented 5 months ago

Hello Peter

Thank you for your reply. I am having all kinds of problems using LAMA. The same call made with TabularAutoML results the following err msg with TabularUtilizedAutoML. So I'd like to have a clean reinstall to make sure everything is current. The link to the notebook returns a "No results matched your search", could you check that? I am unfortunately running Conda on Windows 11, so I hope the new version and the installation play nice together.

Thank you and I look forward to hearing from you.

Stefan

AttributeError: 'TabularUtilizedAutoML' object has no attribute 'reader'

On Wed, Jan 10, 2024 at 2:04 AM PeterBel @.***> wrote:

Hello @numomcmc https://github.com/numomcmc! The problem is that pip installs an old version of LightAutoML (<0.3.0). By default, kaggle uses python3.10 and the latest stable versions of LightAutoML are not seen for pip3.10. We are working on this.

A temporary solution is to install LightAutoML from the lightautoml-0.3.8-py3-none-any.whl https://www.kaggle.com/datasets/alexryzhkov/lightautoml-v0-3-8 file provided by @alexmryzhkov https://github.com/alexmryzhkov. See the notebook https://github.com/sb-ai-lab/LightAutoML/issues/url for an example.

— Reply to this email directly, view it on GitHub https://github.com/sb-ai-lab/LightAutoML/issues/147#issuecomment-1884541261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX6JLHOXI7WK3HZED5EY7YDYNZRUJAVCNFSM6AAAAABBTLJMZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGU2DCMRWGE . You are receiving this because you were mentioned.Message ID: @.***>

BELONOVSKII commented 5 months ago

Stefan, hello once again! I can not figure out why url is not working, so I will provide the code snippet here.

BELONOVSKII commented 5 months ago

Update

Now, you can install LightAutoML on python3.10 by directly specifying the version0.3.8.b1, i.e pip install -U lightautoml==0.3.8b1

numomcmc commented 5 months ago

Hi Peter

The new LAMA doesn't play nice with python 3.9.x, so I ran my notebook on Kaggle w/ 3.10, and it was a success. Thank you for your swift response!

In the meantime, a question about GPU support. Both LightGBM and Catboost can utilize GPU via CUDA. However, LAMA seems to have its own version of Catboost and LightGBM packaged together, and the fact Github page indicated that GPU support is conditionally available, it suggests to me that I can not run LightGBM on GPU via parameters (same for Catboost) the usual way. Is that the correct understanding?

Thank you for your support.

S On Thu, Jan 11, 2024 at 12:25 AM PeterBel @.***> wrote:

Update

Now, you can install LightAutoML on python3.10 by directly specifying the version0.3.8.b1, i.e pip install -U lightautoml==0.3.8b1

— Reply to this email directly, view it on GitHub https://github.com/sb-ai-lab/LightAutoML/issues/147#issuecomment-1886604052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX6JLHPBSYFOFSQSHRXSVPLYN6OWPAVCNFSM6AAAAABBTLJMZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGYYDIMBVGI . You are receiving this because you were mentioned.Message ID: @.***>

BELONOVSKII commented 5 months ago

Hello, Stephen. LightGBM and Catboost automatically switch to GPU, if available.

Moreover, there is an explicit GPU implementation of LightAutoML. Check LightAutoML_GPU.

numomcmc commented 5 months ago

Hi Peter

A few follow up questions.

  1. I would like to give LightAutoML_GPU a try, is the GPU version in synch with the regular LAMA? Meaning, it's now based 0.3.8b1?

  2. Also, I like to learn more about how to use LAMA to tune hyperparameters. The YouTube video talked about how LAMA tunes hyperparameters, but there is no mention of how to actually do it. There is hardly anything about hyperparameter tuning in the documentation as well. Could you provide more information?

  3. In the Kaggle Home Price regression example, as well as the documentation ( https://lightautoml.readthedocs.io/en/latest/_modules/lightautoml/automl/presets/tabular_presets.html#). There are mentions of "lgb_tuned" and "cb_tuned". So I assume that allows users to specify hyperparameter tuning as a part of the training. But how does LAMA actually tune them? Which parameters are tuned, to what range of values? Are the user supplied parameter values considered as the starting value? What are the final resulting hyperparameter values? How does the "optimized" validation compare to non-optimized results, where can I find the comparison? Are they available in the report?

Many thanks for your help.

Stefan

On Fri, Jan 12, 2024 at 7:44 AM PeterBel @.***> wrote:

Hello, Stephen. Yes, this LAMA is developed only for CPU computations.

However, there is an explicit GPU implementation of LightAutoML that could be run on a GPU with CUDA. Check LightAutoML_GPU https://github.com/sb-ai-lab/LightAutoML_GPU.

— Reply to this email directly, view it on GitHub https://github.com/sb-ai-lab/LightAutoML/issues/147#issuecomment-1889534714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX6JLHMAD7TDDWL53H6VW5DYOFK53AVCNFSM6AAAAABBTLJMZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZGUZTINZRGQ . You are receiving this because you were mentioned.Message ID: @.***>