mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

user warning in test: tests/tests_automl/test_targets.py::AutoMLTargetsTest::test_multi_class_abcd_missing_target #753

Closed a-szulc closed 2 weeks ago

a-szulc commented 3 weeks ago
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0 -- /home/adas/mljar/mljar-supervised/venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/adas/mljar/mljar-supervised
configfile: pytest.ini
plugins: cov-5.0.0
collecting ... collected 1 item

tests/tests_automl/test_targets.py::AutoMLTargetsTest::test_multi_class_abcd_missing_target FAILED

=================================== FAILURES ===================================
____________ AutoMLTargetsTest.test_multi_class_abcd_missing_target ____________

self = <tests.tests_automl.test_targets.AutoMLTargetsTest testMethod=test_multi_class_abcd_missing_target>

    def test_multi_class_abcd_missing_target(self):
        X = np.random.rand(self.rows * 4, 3)
        X = pd.DataFrame(X, columns=[f"f{i}" for i in range(3)])
        y = pd.Series(
            np.random.permutation(["a", "B", "CC", "d"] * self.rows), name="target"
        )

        y.iloc[0] = None
        y.iloc[1] = None
        automl = AutoML(
            results_path=self.automl_dir,
            total_time_limit=1,
            algorithms=["Xgboost"],
            train_ensemble=False,
            explain_level=0,
            start_random_models=1,
        )
>       automl.fit(X, y)

tests/tests_automl/test_targets.py:262: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
supervised/automl.py:432: in fit
    return self._fit(X, y, sample_weight, cv, sensitive_features)
supervised/base_automl.py:967: in _fit
    X, y, sample_weight, sensitive_features = self._build_dataframe(
supervised/base_automl.py:789: in _build_dataframe
    X, y, sample_weight, sensitive_features = ExcludeRowsMissingTarget.transform(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

X =            f0        f1        f2
0    0.659600  0.507231  0.164356
1    0.748163  0.352224  0.219499
2    0.884379  0....373825  0.768711  0.140064
198  0.265013  0.473400  0.790041
199  0.492648  0.557452  0.144032

[200 rows x 3 columns]
y = 0      None
1      None
2         B
3         a
4         B
       ... 
195       a
196      CC
197       a
198       B
199       a
Name: target, Length: 200, dtype: object
sample_weight = None, sensitive_features = None, warn = True

    @staticmethod
    def transform(
        X=None, y=None, sample_weight=None, sensitive_features=None, warn=False
    ):
        if y is None:
            return X, y, sample_weight, sensitive_features
        y_missing = pd.isnull(y)
        if np.sum(np.array(y_missing)) == 0:
            return X, y, sample_weight, sensitive_features
        logger.debug("Exclude rows with missing target values")
        if warn:
>           warnings.warn(
                "There are samples with missing target values in the data which will be excluded for further analysis"
            )
E           UserWarning: There are samples with missing target values in the data which will be excluded for further analysis

supervised/preprocessing/exclude_missing_target.py:25: UserWarning
=========================== short test summary info ============================
FAILED tests/tests_automl/test_targets.py::AutoMLTargetsTest::test_multi_class_abcd_missing_target
============================== 1 failed in 1.94s ===============================
a-szulc commented 2 weeks ago

fixed in #768