Open sergeyf opened 7 years ago
Hi, thanks a lot!
First of all, I agree this is a feature that should be implemented, and it should not be too difficult. Would you be interested in contributing it? I am a bit caught up the following month, but I can look into it afterwards.
Regarding workarounds:
I think in the case of regression it's simply a case of subtracting the mean of y_train, and adding it back at the end, as you say. For classification this is not the case, but using sample weights can deal with imbalanced classes quite well.
A simple way around this is to add a dummy column, as performed by the fit_lower='augment'
option.
If you're training a third order FM, as long as you use fit_lower='augment', fit_linear=True
a dummy column is added so basically an intercept is learned.
Otherwise, you can do this in user code easily by using add_dummy_feature
.
Of course, this workaround leads to a regularized intercept which might not be ideal.
HTH!
That is very helpful and answers my question, thanks.
I might have time to contribute this feature, depending on the complexity. What would be involved?
The first step should be figuring out what objective function we want, so we can work out the intercept upgrades. Then, writing some failing unit tests.
Sure, sounds fun. I imagine we can just take the current objective functions and stick a + b
into them?
If I'm training a second order FM, how can I fit the intercept? I see "use fit_lower='augment', fit_linear=True" can not give mi an intercept. Thank you!
If i'm not mistaken, if you use fit_lower='augment', fit_linear=True
, you will (indirectly) be learning an intercept; check the dimensionality of the learned weight vectors and matrices: they will be greater by 1 than the input features. The first entry should correspond to the intercept.
I set parameters as follows: loss = 'logistic', fit_lower='augment', fit_linear=1, degree=2, ncomponents=2 My feature number is 29 and len(fm.P) == 29 and fm.P_.shape == (1, 2, 29). Is there anything wrong I've done?
Thanks for pointing that out, you are not doing anything wrong. Indeed, fit_lower='augment'
was designed with lower degrees in mind, not with linear terms in mind. If you set fit_linear=False, fit_lower='augment'
you will indeed get fm.P_
to be of width 30, but there will be no linear term fm.w_
.
This is kind of by design of the API, and I realize it is not ideal. We could change the api with a deprecation cycle, but I would prefer a pr to actually learn the bias by coordinate descent explicitly.
For your use case, I recommend that you just add the dummy feature (a column of all ones) explicitly:
from polylearn import FactorizationMachine
from sklearn.preprocessing import add_dummy_feature
X_aug = add_dummy_feature(X, value=1)
fm = FactorizationMachine(degree=2, fit_lower=None, fit_linear=True)
fm.fit(X_aug, y)
(I had some typos in the comment above. If viewing this by e-mail, please visit the updated comment on github)
Thank you for replying! I modified _cd_direct_ho routine. When calling _cd_linear_epoch, it modifies X adding a dummy feature to fit the intercept (_w[0]).
First, thanks for the hard work on this package. It looks like a great way to get higher-order interactions to potentially improve on the standard FM models/packages.
It looks like the constant offsets/intercepts are not learned. Is this a to-do item, or is it something that's easy to fix by, for example, doing a global demean of the training outputs
y_train
in the case of regression? What about classification? Does it matter at all in that case?