[FEA] Add option to accept sample weight vectors to fit methods

beckernick commented 5 years ago

Is your feature request related to a problem? Please describe. In sklearn, estimator.fit can (almost always?) accept a sample_weight parameter (defaulting to None) that allows users to pass in a weights vector that determines how much weight each sample should receive (with length equal to the number of samples).

This would be a useful feature for cuML estimators, too. As an example, see the sklearn KMeans documentation

sample_weight : array-like, shape (n_samples,), optional
The weights for each observation in X. If None, all observations are assigned equal weight (default: None)

JohnZed commented 5 years ago

Agreed this will be useful for most estimators. It will be an estimator-by-estimator process to add it, but we could start with linear models and get some commonality there. Not going to make it to 0.9 given current load there, but we'll keep it for a near future release.

JohnZed commented 5 years ago

Priority is for KMeans based on requests

Denisevi4 commented 5 years ago

Linear models pretty please?

JohnZed commented 5 years ago

Sorry, this didn't make it to the current release, but we'll add it to the list for an upcoming release.

JohnZed commented 4 years ago

Removing from 0.13 as we've added the k-means specific: https://github.com/rapidsai/cuml/issues/1625

beckernick commented 3 years ago

I think it may be worth re-opening this issue for tracking purposes.

A variety of issues exist requesting the ability to specify observation-level weights for various estimators and primitives. As the implementation may need to vary across estimators, it may make sense to keep these issues separate but linked together like an epic. Perhaps this issue can serve as that link, as it's the most broad and the oldest.

Estimators

Logistic Regression (#3006 )
KMeans (https://github.com/rapidsai/cuml/issues/1625 ) (done single GPU)
SVM (https://github.com/rapidsai/cuml/issues/2222 ) (done single GPU)
KNN Classifier (https://github.com/rapidsai/cuml/issues/3006#issuecomment-731334288)

Primitives

contingency_matrix (https://github.com/rapidsai/cuml/issues/2142).

Additionally, as these are implemented, it will also unblock using the respective estimators inside the sklearn AdaBoostClassifer meta-estimator API (https://github.com/rapidsai/cuml/issues/2401#issuecomment-663259086)

JohnZed commented 3 years ago

Long term definitely viable. We will evaluate in more detail whether it can make it into 0.19 and mark it as P1 or P0 if so.

rapidsai / cuml

[FEA] Add option to accept sample weight vectors to fit methods #669