jfloff / pywFM

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM
https://pypi.python.org/pypi/pywFM
MIT License
250 stars 42 forks source link

SGDA Requires Validation Set #2

Closed BrianMiner closed 7 years ago

BrianMiner commented 8 years ago

Can a validation set be used? It is a requirement for SGDA but doesnt appear to be an option

jfloff commented 8 years ago

I don't have much experience with SGDA - actually never used it - since I've only used MCMC. Hence, I haven't implemented the validation set option.

Could you give me some info on the structure of the validation set, I could quickly implement that option.

BrianMiner commented 8 years ago

I believe it is simply a data set of the same structure as train and test. If you look at the libfm manual under sdga you will see a validation set mentioned as a requirement. If memory serves, SGDA uses the validation set to "adapt" the learning rate of SGD.

On 12/28/2015 04:47 AM, João Ferreira Loff wrote:

I don't have much experience with SGDA - actually never used it - since I've only used MCMC. Hence, I haven't implemented the validation set option.

Could you give me some info on the structure of the validation set, I could quickly implement that option.

— Reply to this email directly or view it on GitHub https://github.com/jfloff/pywFM/issues/2#issuecomment-167528309.

jfloff commented 8 years ago

I've uploaded a new version with this change (also updated README).

Could you please test this change? (make sure its version 0.8.1 . Use pip --no-cache-dir install pywFM)

I'm wasn't sure if the validation set should be placed as a FM class parameter or as a FM.run function parameter. My intuition said that its was best to include it in the FM.run since the validation set changes according to the train/test data. Feel free to correct me.

Thank you,

PS: if you feel like it, adding a simple-sgda example would be awesome!

jfloff commented 8 years ago

@BrianMiner Have you been able to test this change?

Thank you,

BrianMiner commented 8 years ago

I have not yet, I will try asap!

dylanjf commented 7 years ago

hey @jfloff, thanks for the wrapper. I've noticed that parameters were being ignored for methods sgd / sgda, which were resulting in the algorithms not converging

fix for this is within the init, you're casting some decimal values into integers with %d, so if I were to put in r2_regularization=0.01, its translated as 0. there's also a spot when you're building the args for the subprocess where the learning rate could again be reduced to an integer as well.

switching those formats to something like %.5f fixes this and the call to libfm runs as expected.

edit: also, adding the -regular argument for sgda's is unnecessary since its trying to optimize those values after each iteration

jfloff commented 7 years ago

Hi @dylanjf I've updated the %f in regularization (and the learn rate) argument according to your suggestion %.5f

Could you confirm that the changes fixes your problem? Thank you!