sktime / skpro

A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
250 stars 46 forks source link

[ENH] Multiclass classification reduction using Histograms #410

Closed ShreeshaM07 closed 4 months ago

ShreeshaM07 commented 5 months ago

Reference Issues/PRs

fixes #378

What does this implement/fix? Explain your changes.

It adds an adapter to sklearn multiclass classifiers and expresses them as a Histogram probability distribution.

Does your contribution introduce a new dependency? If yes, which one?

Did you add any tests for the change?

No

PR checklist

For all contributions
For new estimators
ShreeshaM07 commented 5 months ago

@fkiraly , I have completed the classifier and all checks have passed. Let me know if you want any modifications.

ShreeshaM07 commented 5 months ago

I have made some changes to the name and the folder structure please let me know if this is suitable.

ShreeshaM07 commented 5 months ago

blocking: the 1D array case is not covered by tests, can be done in get_test_params or separately

Is it possible to check for 1D case in get_test_params itself? as I will not be able to modify any inputs for predict and predict_proba other than the classifier instance and number of bins.

fkiraly commented 5 months ago

yes, but we know that the inputs for y are numbers, so we should be able to come up with a reasonable range.

We should also think of the casewhere the user specifies bins and some y fall outside - in this case, I think the estimator should not break but do sth sensible (move to closest bin, or discard - perhaps even as an option on what to do)

ShreeshaM07 commented 4 months ago

yes, but we know that the inputs for y are numbers, so we should be able to come up with a reasonable range.

Sorry I didn't quite get how this would help in testing for 1D case in get_test_params. Just to be clear are we talking about the single Histogram DIstribution case produced when input X for predict and predict_proba is of shape(1,X.cols) or this by any chance?

        elif len(y_binned.shape) > 1 and y_binned.shape[1] == 1:
            y_binned = y_binned[:, 0]

We should also think of the casewhere the user specifies bins and some y fall outside - in this case, I think the estimator should not break but do sth sensible (move to closest bin, or discard - perhaps even as an option on what to do)

I have handled this and moved y values outside bin range to the closest bin. Also placed a note in the docstring to let the user know of this.

fkiraly commented 4 months ago

Sorry I didn't quite get how this would help in testing for 1D case in get_test_params.

I meant, there should be a test case in get_test_params, where the bins passed ar an 1D array, currently there isn't one.

ShreeshaM07 commented 4 months ago

Sorry I didn't quite get how this would help in testing for 1D case in get_test_params.

I meant, there should be a test case in get_test_params, where the bins passed ar an 1D array, currently there isn't one.

Ahh I see. Now that the y values outside bin range are also handled it makes all the more sense to include the array case for bins. I hadn't included earlier as I did not know the range of y and that might break the code when the bins are pre-meditated. Thanks for noticing.

ShreeshaM07 commented 4 months ago

Kind reminder are we waiting on something for this merge ?

fkiraly commented 4 months ago

no, not in particular - just in case any other devs want to give it a review