[BUG] cyclic boosting - sporadic test failures due to convergence failure

fkiraly commented 8 months ago

The recently added cyclic boosting estimator sporadically fails tests due to failed convergence of the loss, e.g.,:

FAILED skpro/regression/tests/test_cyclic_boosting.py::test_cyclic_boosting_with_manual_paramaters - cyclic_boosting.utils.ConvergenceError: Your cyclic boosting training seems to be diverging. In the 9. iteration the current loss: 52.52700124396056, is greater than the trivial loss with just mean predictions: 20.816666666666666.

FYI @setoguchi-naoki, @felixwick

FelixWick commented 8 months ago

I would suggest to fix the random_state when calling sklearn's train_test_split to get a reproducible result. Best to use one which converges for our test case ;).

fkiraly commented 8 months ago

that fixes the failing test, though it might be worth checking if there is a user facing problem here.

FelixWick commented 8 months ago

I don't think so. Usually, Cyclic Boosting is quite robust and convergence issues rare. The data set used in the test might not be the best choice for quantile regression.

fkiraly commented 8 months ago

The data set used in the test might not be the best choice for quantile regression.

Should we then replace it with a better choice?

FelixWick commented 8 months ago

Maybe California housing?

fkiraly commented 8 months ago

Hm, there are also still timeouts appearing in the tests - seems like 10% or cases or so, it runs forever. Example:

FAILED skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_does_not_overwrite_hyper_params[CyclicBoosting-ProbaRegressorBasic] - Failed: Timeout >600.0s

Would be great if you could diagnose what is going on and try to make sure the algorithm always terminates. You can run the tests locally by using skpro.utils.check_estimator.

Not a big issue as we can manage with the tests - it is also odd since the data sets used in testing are fairly small and similar to toy datasets that a typical user could start playing with (and scare them off if it takes ages to run, even if it's an accident).

FelixWick commented 8 months ago

Hm, that's weird, because we have a hard stop at 10 iterations (if I'm not missing anything).

fkiraly commented 8 months ago

Maybe they are just very long iterations then? 600 sec is 10min!

FelixWick commented 8 months ago

That should not happen with small data sets. Maybe something is messed up. Let me have a look these days.

fkiraly commented 8 months ago

That should not happen with small data sets. Maybe something is messed up. Let me have a look these days.

I suppose it's good then that we have stringent tests. FYI, I think we also just found a bug in sklearn: https://github.com/sktime/skpro/pull/192 They do not seem to be testing their probabilistic interfaces systematically!

fwick-panasonic commented 8 months ago

I couldn't get it to work locally in a hurry. Can you try to point me to the point in the Cyclic Boosting code at which it gets stuck? Then I might be able to get an idea what is happening.

fkiraly commented 8 months ago

Can you try to point me to the point in the Cyclic Boosting code at which it gets stuck? Then I might be able to get an idea what is happening.

The CI unfortunately does not say, there is no traceback if it keeps running forever, I would suggest you manually run the code in a loop and then kill at the first loop run that takes too long, that will give you a traceback. E.g., print out loop index, and kill when the next number does not appear for 10min.

The culprit are different tests, for instance test_cyclic_boosting_with_manual_paramaters (it's written with a in the code base)

FelixWick commented 8 months ago

Can you try with multiplicative instead of additive mode in test_cyclic_boosting_with_manual_paramaters? The target range of this data set and the chosen QPD bound call for it.

fkiraly commented 8 months ago

sure: https://github.com/sktime/skpro/pull/200

in your estimation, is this more of a wild guess or a certain fix?

fkiraly commented 8 months ago

Hm, so far so good, I'll restart it a couple times for good measure.

The runtimes seem very long though, still:

407.99s call     skpro/regression/tests/test_cyclic_boosting.py::test_cyclic_boosting_simple_use
292.31s call     skpro/regression/tests/test_cyclic_boosting.py::test_cyclic_boosting_with_manual_paramaters
242.42s call     skpro/regression/tests/test_all_regressors.py::TestAllRegressors::test_input_output_contract[CyclicBoosting]
213.67s call     skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_returns_self[CyclicBoosting-ProbaRegressorSurvival]
212.68s call     skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_updates_state[CyclicBoosting-ProbaRegressorBasic]
210.71s call     skpro/tests/test_all_estimators.py::TestAllEstimators::test_fit_returns_self[CyclicBoosting-ProbaRegressorXcolMixIxYnp]

FelixWick commented 8 months ago

I think the main reason for this is that we have an individual QPD for each sample. That's a good thing, of course, but a bit expensive.

Another thing is that the quantile regression mode in Cyclic Boosting is not yet optimized in terms of runtime. (Its optimization is done differently to the older modes, which are very fast.) But I have opened an issue to speed it up. I'm sure it can be made a lot faster with a bit of tuning.

fkiraly commented 8 months ago

That's a good thing, of course, but a bit expensive.

What exactly is the bottleneck? Is it computations in the distribution, the fitting, prediction, etc? Perhaps I can help speed it up.

FelixWick commented 8 months ago

The Cyclic Boosting quantile regression and subsequent QPD "construction" are two independent steps. In fact, the QPD step can work with any quantile prediction, no matter how it was done (e.g., HistGradientBoostingRegressor in sklearn is a good alternative). We might even consider moving it in an independent package, if there is enough interest.

For Cyclic Boosting quantile regression, the main issue is the iterative optimization in training, that is not optimized for speed at all so far. I just made it work.

For QPD, I think most of the time is lost in the overhead of the for loop over the data set during inference (there is no training or fitting here). Maybe there is a way to "vectorize" this?

Your help is more than welcome!

fkiraly commented 8 months ago

In fact, the QPD step can work with any quantile prediction, no matter how it was done

Interesting - this could be a feature to add to the multiple quantile regressor. What is the abstract type of it? Converting a sequence of quantile predictions into a distribution?

For QPD, I think most of the time is lost in the overhead of the for loop over the data set during inference (there is no training or fitting here). Maybe there is a way to "vectorize" this?

Can you link to it?

fkiraly commented 8 months ago

PS: sporadic test failures due to not terminating are still there, even with `"multiplicative" setting: https://github.com/sktime/skpro/actions/runs/7733735315/job/21087278609?pr=200

FelixWick commented 8 months ago

Yes, symmetric quantile triplet in, distribution out.

fkiraly commented 8 months ago

the suggested fix did not work, any at suggestions?

FelixWick commented 8 months ago

Let me see if I can speed up the QPD loop. That should solve the issue.

FelixWick commented 8 months ago

@fkiraly I have found an upstream solution by allowing QPD to take full arrays rather than individual quantile values. I'm confident that will fix this here. Give me some days to work it out and build a new Cyclic Boosting release.

fkiraly commented 8 months ago

ok, thanks - then there's nothing to do in skpro, just wait for the new release, right?

I'll move towards the 2.2.0 release then.

FelixWick commented 8 months ago

Slight adaptations will be needed: getting rid of the loop over QPD calls. But we can do that when I'm done in Cyclic Boosting.

FelixWick commented 8 months ago

Ok, Cyclic Boosting 1.4.0 is there. @setoguchi-naoki will go ahead and make the relevant changes here (He already knows what is to do.). In short:

remove individual QPD creation loop in QPD_S and QPD_B classes (it's now done directly in Cyclic Boosting)
use original rather than extended J-QPD modes in skpro, and drop QPD_U (reason is that extended modes are not vectorized for now, and the benefit is marginal)

fkiraly commented 8 months ago

Nice!

Related question: I was thinking of implementing a more general quantile parameterized distribution, i.e., not just with three anchors, but an arbitrary number. This is also done implicitly as part of MultipleQuantileRegressor - would this be of interest to you? This might be a better default for predict_proba if predict_quantiles is implemented than the current normal approximation.

FelixWick commented 8 months ago

Don't forget: The more quantiles you need, the more expensive it gets ... at least with the usual pinball loss approaches.

fkiraly commented 8 months ago

Don't forget: The more quantiles you need, the more expensive it gets ... at least with the usual pinball loss approaches.

I'm considering a situation where you already have them - they do not necessarily need to be fitted independently.

sktime / skpro

[BUG] cyclic boosting - sporadic test failures due to convergence failure #188