pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.93k stars 1.77k forks source link

[INSTALL]: Proposal to resolve the cuML integration problems #3075

Closed beckernick closed 1 year ago

beckernick commented 2 years ago

Installation check

Platform

Linux-5.8.0-38-generic-x86_64-with-glibc2.31

Installation Method

Built from source

pycaret Version

Source build

Python Version

3.9.13

Description

https://github.com/pycaret/pycaret/issues/2710, https://github.com/pycaret/pycaret/issues/2914 , and https://github.com/pycaret/pycaret/issues/2987 are several open issues that illustrate the challenges of using recent releases of cuML with PyCaret.

I investigated these issues and believe the following summary captures the current state:

I believe that the following changes allow using the current cuML (22.10) or higher with PyCaret smoothly and do not cause any additional test failures.

diff --git a/pycaret/internal/pycaret_experiment/tabular_experiment.py b/pycaret/internal/pycaret_experiment/tabular_experiment.py
index d070e916..cb76b08e 100644
--- a/pycaret/internal/pycaret_experiment/tabular_experiment.py
+++ b/pycaret/internal/pycaret_experiment/tabular_experiment.py
@@ -346,11 +346,8 @@ class _TabularExperiment(_PyCaretExperiment):
                 cuml_version = __version__
                 self.logger.info(f"cuml=={cuml_version}")

-                cuml_version = cuml_version.split(".")
-                cuml_version = (int(cuml_version[0]), int(cuml_version[1]))
-
             if cuml_version is None or not version.parse(cuml_version) >= version.parse(
-                "0.15"
+                "22.10"
             ):
                 message = f"cuML is outdated or not found. Required version is >=0.15, got {__version__}"
                 if use_gpu == "force":
diff --git a/requirements.txt b/requirements.txt
index aec84845..da543a87 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -12,7 +12,7 @@ pyod>=0.9.8
 imbalanced-learn>=0.8.1
 category-encoders>=2.4.0
 lightgbm>=3.0.0
-numba~=0.55.0
+numba>=0.55.0
 requests>=2.27.1  # Required by pycaret.datasets
 psutil>=5.9.0
 markupsafe>=2.0.1  # Fixes Google Colab issue

With PyCaret installed from source and updated like above in the following conda environment, things work as expected:

mamba create -n rapids-22.10-pycaret -c rapidsai -c nvidia -c conda-forge rapids=22.10 python=3.9 cudatoolkit=11.5 jupyterlab strings_udf
conda activate rapids-22.10-pycaret
git clone https://github.com/pycaret/pycaret.git
cd pycaret
python -m pip install .

Testing

I ran the pytests locally with the patch above in an environment including the full set of dependencies from requirements-test.txt and requiremens-optional.txt to see if anything failed. I saw several failures, so I tested with a clean environment with a fresh pycaret source build (with no changes). In both environments, the same 8 tests failed, suggesting that this change probably does not cause any net new test failures:

Test failures with standard PyCaret built from source:

===================================================================== short test summary info =====================================================================
FAILED tests/test_check_fairness.py::test_check_fairness_multiclass_classification - TypeError: object of type 'bool' has no len()
FAILED tests/test_classification_plots.py::test_plot - _tkinter.TclError: invalid command name ".!navigationtoolbar2tk.!button2"
FAILED tests/test_nlp.py::test_nlp - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_setup_fails_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_create_model_fails_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_setup_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_create_models_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_regression_plots.py::test_plot - _tkinter.TclError: invalid command name ".!navigationtoolbar2tk.!button2"
============================================== 8 failed, 549 passed, 3 skipped, 4823 warnings in 3515.72s (0:58:35) ===============================================

Identical test failures with PyCaret built from source with the above patch:

===================================================================== short test summary info =====================================================================
FAILED tests/test_check_fairness.py::test_check_fairness_multiclass_classification - TypeError: object of type 'bool' has no len()
FAILED tests/test_classification_plots.py::test_plot - _tkinter.TclError: invalid command name ".!navigationtoolbar2tk.!button2"
FAILED tests/test_nlp.py::test_nlp - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_setup_fails_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_create_model_fails_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_setup_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_nlp.py::TestNLPExperimentCustomTags::test_nlp_create_models_with_experiment_custom_tags - OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
FAILED tests/test_regression_plots.py::test_plot - _tkinter.TclError: invalid command name ".!navigationtoolbar2tk.!button2"
============================================== 8 failed, 549 passed, 3 skipped, 4689 warnings in 3481.17s (0:58:01) ===============================================

Given the above, @ngupta23 @Yard1 , would you be open to accepting a PR to unblock using the current cuML with the current PyCaret?

cc @dantegd @wphicks (awareness)

Installation Logs

Replace this line with the installation logs.
moezali1 commented 2 years ago

@Yard1 What are your thoughts on this?

ngupta23 commented 1 year ago

The changes to numba pinning (i.e. relaxing it) look to be ok from my perspective.

ngupta23 commented 1 year ago

@beckernick Feel free to submit the PR. If it passes on GitHub, we can accept it. I think you are missing the vocabulary dictionary locally which is why the local tests are failing.

beckernick commented 1 year ago

Yeah, you're right. Looks like 5/8 are from not downloading the spacy model ahead of time.

And sounds good, thanks! Will open a PR (may take a few more days due to some internal approvals contributing to a new project).

ngupta23 commented 1 year ago

Sounds good. If you get the latest master, some of the remaining 3 tests should be fixed as well.