ibayer / fastFM

fastFM: A Library for Factorization Machines
http://ibayer.github.io/fastFM
Other
1.07k stars 206 forks source link

Modifying the usage to use weighted train data , instead of individual train data points #109

Open ekta1007 opened 7 years ago

ekta1007 commented 7 years ago

Say, you have data point in test as

X1, X2, X2 -> Y1 X1, X2, X2 -> Y1 X1, X2, X2 -> Y1 X1, X2, X3 -> Y1' X1, X2, X3 -> Y1' - this gives fm.w0, fm.w and fm.V_ as learnt model params

instead of treating them as 4 points (which increases the size of the train data set), is it possible to use weights,such that we train using the full sample, but now on aggregate data points, with the number of times occurred as weights, instead of 5 data points, as in example above ?

X1, X2, X2 -> Y1 - weight 3 X1, X2, X3 -> Y1' -> weight 2 so that the training still gives us the same fm.w0, fm.w and fm.V_ as it were trained with 5 samples above.

ibayer commented 7 years ago

Sample weights are not yet supported but I plan to add this feature with the next major release.

zeeraktalat commented 5 years ago

Is there an update on this - I imagine it could be handled with conforming to the sklearn sample_weights parameter which can be provided when fitting a model (see https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit)

ibayer commented 5 years ago

@ZeerakW The problem is that the currently solver doesn't support sample weight's yet. A complete rewrite which does support sample weights is close to completion but I can't give you an release date yet (think months not days).

A sample_weight parameter will be made available with the new release.

zeeraktalat commented 5 years ago

@ibayer Oh that sounds amazing! Thanks for your efforts!

jwasserman2 commented 3 years ago

Hi @ibayer, thank you for your work on this. Do you have an updated estimate of when it will be available?

ibayer commented 3 years ago

@jwasserman2

I can give an update but not an (estimated) release date.

For regression we already have released c++ code supporting sample weights but the python interface doesn't support it yet. The current plan is to add more solvers (classification is probably next) before making sample weights available in python.

However, feel free to open a feature request on https://github.com/palaimon/fastfm2 to help us with prioritization.

jwasserman2 commented 3 years ago

@ibayer Congrats on the new package and updating the c++ code! I was specifically thinking about using weights for classification using the SGD solver. Are you still planning on having the 3 solvers ALS, SGD, and MCMC?

ibayer commented 3 years ago

@jwasserman2

Are you still planning on having the 3 solvers ALS, SGD, and MCMC?

Yes, BUT imo sgd is the least interesting solver and implemented more for completeness. ALS / coordinate descent is in general both faster and easier to use for FMs.

What your motivation to prefer sgd?

jwasserman2 commented 3 years ago

While testing the different solvers, I was running into a data input error (if I remember correctly) using als that I was not seeing when using sgd. I mainly just knew that I didn't want to use MCMC for speed purposes, but between the two I did not have priors on which would be better for my use case.

If it would be helpful for me to recreate the als error for v2 of your package let me know!

ibayer commented 3 years ago

@jwasserman2

This makes sense. fastfm uses probit regression (same as libfm) for als classification which is less stable the the sigmoid transform used for the sgd classification.

If it would be helpful for me to recreate the als error for v2 of your package let me know!

Thanks for the offer. I hope it's not needed since v2 is a complete rewrite and uses iteratively reweighted least squares for als classification. The new approach is expected to be more stable and shouldn't have the issue you observed.

edit: I recommend to star https://github.com/palaimon/fastfm2 and open issue with request for sample weight support. This way you get an notification as soon as we add the feature.

jwasserman2 commented 3 years ago

Awesome, will do. Thank you again for taking the time to add this functionality, looking forward to its release!