Open ShreeshaM07 opened 4 months ago
I am encountering Singular Matrix errors when doing CI checks for other PRs, wondering if this is related? These are the tests that are failing in #370
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_does_not_overwrite_hyper_params[RandomizedSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_updates_state[GridSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_returns_self[RandomizedSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_does_not_overwrite_hyper_params[GridSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_updates_state[RandomizedSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_returns_self[GridSearchCV-2-ProbaRegressorSurvival] - numpy.linalg.LinAlgError: Singular matrix
Hm, I think this is due to the CoxPH
used in parameter set 2 which is not robust when used on a small dataset.
We could:
skpro
without soft dependencies. Currently, the only such models are composites, using, say, ConditionUncensored
wrapping ResidualDouble
or EnbPI
.Do you have a particular preference? I'm not too familiar with survival models so recommendations would be helpful here
summarizing ealrier discussion today, any survival model without soft deps and numerically stable on small data should do for the purpose of smooth testing. ResidualDouble
with LinearRegression
or similar.
Describe the bug
In the
gradent_boosting
which has an interface of theNGBRegressor
inskpro
asNGBoostRegressor
theTDistribution
seems to be failing to run as expected. It is raising errors likeTo Reproduce
Upon using
sklearn
's diabetes dataset and the breast_cancer dataset it is producing the sameSingular Matrix
error. To reproduceExpected behavior
The expected output must look something like this
Environment
Python 3.11.8 ngboost 0.5.1
Additional context
The issue is to find out whether there is an issue with the interfacing ie the
skpro
API or genuinely a bug in thengboost
TDistribution
itself.