rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.06k stars 525 forks source link

[TRACKER] Hypothesis failures #5503

Open csadorf opened 1 year ago

csadorf commented 1 year ago

Issue to track hypothesis test failures.

### Tasks
- [ ] [test_logistic_regression_unscaled](https://github.com/rapidsai/cuml/actions/runs/5541030128/jobs/10113904057)
- [ ] [test_logistic_regression_model_digits](https://github.com/rapidsai/cuml/actions/runs/5554023930/jobs/10143320221)
- [ ] [test_logistic_regression_predict_proba](https://github.com/rapidsai/cuml/actions/runs/5584582744/job/15123636858)

To report new issues, simply comment with a link to the corresponding failed CI run.

bdice commented 4 months ago

The log links above have expired (logs are not kept indefinitely). However, some of these issues have reoccured:

test_logistic_regression_unscaled https://github.com/rapidsai/cuml/actions/runs/8261345241/job/22598475935#step:7:2300 ``` =========================== short test summary info ============================ FAILED test_linear_model.py::test_logistic_regression_unscaled - hypothesis.errors.Flaky: Hypothesis test_logistic_regression_unscaled(dtype=dtype('>f8'), penalty='none', l1_ratio=0.20501583639686288) produces unreliable results: Falsified on the first call but did not on a subsequent one Falsifying example: test_logistic_regression_unscaled( dtype=dtype('>f8'), penalty='none', l1_ratio=0.20501583639686288, ) Failed to reproduce exception. Expected: dtype = dtype('>f8'), penalty = 'none', l1_ratio = 0.20501583639686288 @given( dtype=floating_dtypes(sizes=(32, 64)), penalty=st.sampled_from(("none", "l1", "l2", "elasticnet")), l1_ratio=st.one_of(st.none(), st.floats(min_value=0.0, max_value=1.0)), ) def test_logistic_regression_unscaled(dtype, penalty, l1_ratio): if penalty == "elasticnet": assume(l1_ratio is not None) # Test logistic regression on the breast cancer dataset. We do not scale # the dataset which could lead to numerical problems (fixed in PR #2543). X, y = load_breast_cancer(return_X_y=True) X = X.astype(dtype) y = y.astype(dtype) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) params = { "penalty": penalty, "C": 1, "tol": 1e-4, "fit_intercept": True, "max_iter": 5000, "l1_ratio": l1_ratio, } culog = cuLog(**params) culog.fit(X_train, y_train) score_train = culog.score(X_train, y_train) score_test = culog.score(X_test, y_test) target(1 / score_train, label="inverse train score") target(1 / score_test, label="inverse test score") # TODO: Use a more rigorous approach to determine expected minimal scores # here. The values here are selected empirically and passed during test # development. assert score_train >= 0.94 > assert score_test >= 0.94 E assert 0.9370629191398621 >= 0.94 test_linear_model.py:604: AssertionError You can reproduce this example by temporarily adding @reproduce_failure('6.99.5', b'AAEBAAEXBwIArZNHvq+HySoAwCSpJ8Pq+8U=') as a decorator on your test case Highest target scores: 1.04412 (label='inverse train score') 1.06716 (label='inverse test score') = 1 failed, 13352 passed, 6456 skipped, 634 xfailed, 54 xpassed, 10821 warnings in 1798.64s (0:29:58) = ```