scikit-learn-contrib / polylearn

A library for factorization machines and polynomial networks for classification and regression in Python.
http://contrib.scikit-learn.org/polylearn/
BSD 2-Clause "Simplified" License
245 stars 43 forks source link

Learning intercepts? #10

Open sergeyf opened 7 years ago

sergeyf commented 7 years ago

First, thanks for the hard work on this package. It looks like a great way to get higher-order interactions to potentially improve on the standard FM models/packages.

It looks like the constant offsets/intercepts are not learned. Is this a to-do item, or is it something that's easy to fix by, for example, doing a global demean of the training outputs y_train in the case of regression? What about classification? Does it matter at all in that case?

vene commented 7 years ago

Hi, thanks a lot!

First of all, I agree this is a feature that should be implemented, and it should not be too difficult. Would you be interested in contributing it? I am a bit caught up the following month, but I can look into it afterwards.

Regarding workarounds:

I think in the case of regression it's simply a case of subtracting the mean of y_train, and adding it back at the end, as you say. For classification this is not the case, but using sample weights can deal with imbalanced classes quite well.

A simple way around this is to add a dummy column, as performed by the fit_lower='augment' option.

If you're training a third order FM, as long as you use fit_lower='augment', fit_linear=True a dummy column is added so basically an intercept is learned.

Otherwise, you can do this in user code easily by using add_dummy_feature.

Of course, this workaround leads to a regularized intercept which might not be ideal.

HTH!

sergeyf commented 7 years ago

That is very helpful and answers my question, thanks.

I might have time to contribute this feature, depending on the complexity. What would be involved?

vene commented 7 years ago

The first step should be figuring out what objective function we want, so we can work out the intercept upgrades. Then, writing some failing unit tests.

sergeyf commented 7 years ago

Sure, sounds fun. I imagine we can just take the current objective functions and stick a + b into them?

vonjackustc commented 5 years ago

If I'm training a second order FM, how can I fit the intercept? I see "use fit_lower='augment', fit_linear=True" can not give mi an intercept. Thank you!

vene commented 5 years ago

If i'm not mistaken, if you use fit_lower='augment', fit_linear=True, you will (indirectly) be learning an intercept; check the dimensionality of the learned weight vectors and matrices: they will be greater by 1 than the input features. The first entry should correspond to the intercept.

vonjackustc commented 5 years ago

I set parameters as follows: loss = 'logistic', fit_lower='augment', fit_linear=1, degree=2, ncomponents=2 My feature number is 29 and len(fm.P) == 29 and fm.P_.shape == (1, 2, 29). Is there anything wrong I've done?

vene commented 5 years ago

Thanks for pointing that out, you are not doing anything wrong. Indeed, fit_lower='augment' was designed with lower degrees in mind, not with linear terms in mind. If you set fit_linear=False, fit_lower='augment' you will indeed get fm.P_ to be of width 30, but there will be no linear term fm.w_.

This is kind of by design of the API, and I realize it is not ideal. We could change the api with a deprecation cycle, but I would prefer a pr to actually learn the bias by coordinate descent explicitly.

For your use case, I recommend that you just add the dummy feature (a column of all ones) explicitly:

from polylearn import FactorizationMachine
from sklearn.preprocessing import add_dummy_feature
X_aug = add_dummy_feature(X, value=1)
fm = FactorizationMachine(degree=2, fit_lower=None, fit_linear=True)
fm.fit(X_aug, y)
vene commented 5 years ago

(I had some typos in the comment above. If viewing this by e-mail, please visit the updated comment on github)

vonjackustc commented 5 years ago

Thank you for replying! I modified _cd_direct_ho routine. When calling _cd_linear_epoch, it modifies X adding a dummy feature to fit the intercept (_w[0]).