Closed kmedved closed 4 years ago
Great suggestion. I just implemented this in https://github.com/stanfordmlgroup/ngboost/commit/f108d672e62633ca5cea90720c794953b77044b5. Feel free to try it out and let me know if it works for you! You should be able to use it like:
X, y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2)
weight = np.random.random(Y_train.shape) # for testing
ngb = NGBClassifier()
ngb.fit(X_train, Y_train, sample_weight=weights)
You can even use weights for the validation set, and in combination with early stopping:
X, y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2)
weight = np.random.random(Y_train.shape)
val_weight = np.random.random(Y_test.shape)
ngb = NGBClassifier()
# early stopping stops the fitting if the validation loss has not gone under the past minimum for more than K (K=10 here) iterations
ngb.fit(X_train, Y_train, X_val=X_test, Y_val=Y_test, sample_weight=weight, val_sample_weight=val_weight, early_stopping_rounds=10)
These are classification examples but it should work the same for regression.
I should note that the model initialization does not use the sample weights since scipy.stats distributions don't accept sample weights in their fit()
methods, unfortunately. The workaround to that is to overwrite them with our own that do, but I'm not sure it's worth the effort. Initialization is somewhat arbitrary anyways.
Lack of initialization is fine I think. From early testing, this seems to work great. I will do further testing in the next week to confirm the results.
Thanks!
This is an amazing project and I have high hopes for using ngboost in my work. I don't currently see any sample_weight functionality. Are there any plans to add this? (I apologize, as I lack the technical expertise to do it myself).