scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
477 stars 106 forks source link

Vectorized diversity calculations #272

Closed derekahuang closed 1 year ago

derekahuang commented 1 year ago

Hi there,

I am using DESlib for large scale diversity calculations and I'm running into perf issues with the for loop in _process_predictions. I was wondering if we have thought about vectorizing this function for bulk calculations. I have reimplemented a vectorized version of the diversity functions that accepts the parameters

    y : array of shape (n_samples,):
    y_pred1 : array of shape (n_samples,):
    y_pred2 : array of shape (n_classifiers, n_samples):

but it is less general as y_pred1 and y_pred2 are no longer swappable. This returns an array of shape (n_classifiers,) that contains the calculations of y_pred1 with every row of y_pred2. If the maintainers are interested in this let me know, I am more than happy to help integrate this into the library!

Menelau commented 1 year ago

@derekahuang hello,

Yes, we are interested in having such function vectorized. In fact, it is one of the issues currently in the repository (issue #71 )

It will be very appreciated if you can send a PR to add this functionality to the library. The methods based on diversity are much slower than the other DS techniques due to this loop not being vectorized yet. If some of its functionality are not trivially vectorized, another solution would be .o write this part of code as a Cython implementation

derekahuang commented 1 year ago

Are you OK with this interface, since we are only batching one classifier? Would I create a new file, like diversity_batched.py?

derekahuang commented 1 year ago

@Menelau i've opened a PR: https://github.com/scikit-learn-contrib/DESlib/pull/273

Menelau commented 1 year ago

@derekahuang great, I will check it asap and get back with comments and/or merge.

Thanks for contributing with this feature!

Menelau commented 1 year ago

Fixed with PR #273