jfloff / pywFM

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM
https://pypi.python.org/pypi/pywFM
MIT License
250 stars 43 forks source link

Predict for new data. #20

Closed vi3k6i5 closed 7 years ago

vi3k6i5 commented 7 years ago

Say I trained the model with

fm.run(train_x, train_y, val_x, val_y)

How do i run prediction for another dataset?

pred_y = fm.run(test_x)

run method expects y_test as input, Which doesn't make sense at all. run(self, x_train, y_train, x_test, y_test, x_validation_set=None, y_validation_set=None, meta=None)

jfloff commented 7 years ago

1) Regarding y_test as input: libfm uses the test values to output some results regarding its predictions. They are not used when training the model. If I'm not mistaken, you could actually set them to a dummy value and just collect the predictions (just disregard the prediction statistics since those will be wrong). For more info check libfm manual.

2) Regarding running against a new dataset, at this moment you can't. Its a limitation from libfm itself. You have to train again. See this issue #7

Hope it helps,

vi3k6i5 commented 7 years ago

Thanks :) Please add the same in README maybe. Might help others.

jfloff commented 7 years ago

I'm so busy atm that I can't even breathe! Feel free to PR that change ;)

vi3k6i5 commented 7 years ago

Done. https://github.com/jfloff/pywFM/pull/21

Will try to make changes for https://github.com/jfloff/pywFM/issues/7 also. Any tips on how I should approach that problem ??

I was thinking of saving the trained model in a file. and keeping the reference object inside the model object. Adding a method named predict and predicting with that model object.

Doesn't seem clean, but it's a quick hack. Let me know.

jfloff commented 7 years ago

The better approach would be something supported from the original libfm repo. But I think that's not going to happen... They even removed support for the save/load model on the MCMC method.

I guess we could try your approach. Just remember to correctly clean temporary files, those can be quite big when dealing with large datasets.

Kudos for tackling this!