mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

user warning in test: tests/tests_validation/test_validator_split.py::SplitValidatorTest::test_disable_repeats_when_disabled_shuffle #762

Closed a-szulc closed 2 weeks ago

a-szulc commented 3 weeks ago
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0 -- /home/adas/mljar/mljar-supervised/venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/adas/mljar/mljar-supervised
configfile: pytest.ini
plugins: cov-5.0.0
collecting ... collected 1 item

tests/tests_validation/test_validator_split.py::SplitValidatorTest::test_disable_repeats_when_disabled_shuffle FAILED

=================================== FAILURES ===================================
________ SplitValidatorTest.test_disable_repeats_when_disabled_shuffle _________

self = <tests.tests_validation.test_validator_split.SplitValidatorTest testMethod=test_disable_repeats_when_disabled_shuffle>

    def test_disable_repeats_when_disabled_shuffle(self):
        with tempfile.TemporaryDirectory() as results_path:
            data = {
                "X": pd.DataFrame(
                    np.array(
                        [[0, 0], [0, 1], [1, 0], [0, 1], [1, 0], [0, 1], [1, 0], [1, 1]]
                    ),
                    columns=["a", "b"],
                ),
                "y": pd.DataFrame(
                    np.array([0, 0, 1, 0, 1, 0, 1, 1]), columns=["target"]
                ),
            }

            X_path = os.path.join(results_path, "X.data")
            y_path = os.path.join(results_path, "y.data")

            dump_data(X_path, data["X"])
            dump_data(y_path, data["y"])

            params = {
                "shuffle": False,
                "stratify": False,
                "train_ratio": 0.5,
                "results_path": results_path,
                "X_path": X_path,
                "y_path": y_path,
                "repeats": 3,
            }
>           vl = SplitValidator(params)

tests/tests_validation/test_validator_split.py:217: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <supervised.validation.validator_split.SplitValidator object at 0x76ec9b316180>
params = {'X_path': '/tmp/tmp8tlw6fki/X.data', 'repeats': 3, 'results_path': '/tmp/tmp8tlw6fki', 'shuffle': False, ...}

    def __init__(self, params):
        BaseValidator.__init__(self, params)

        self.train_ratio = self.params.get("train_ratio", 0.8)
        self.shuffle = self.params.get("shuffle", True)
        self.stratify = self.params.get("stratify", False)
        self.random_seed = self.params.get("random_seed", 1234)
        self.repeats = self.params.get("repeats", 1)

        if not self.shuffle and self.repeats > 1:
>           warnings.warn("Disable repeats in validation because shuffle is disabled")
E           UserWarning: Disable repeats in validation because shuffle is disabled

supervised/validation/validator_split.py:27: UserWarning
=========================== short test summary info ============================
FAILED tests/tests_validation/test_validator_split.py::SplitValidatorTest::test_disable_repeats_when_disabled_shuffle
============================== 1 failed in 1.89s ===============================
a-szulc commented 2 weeks ago

fixed in #768