jfloff / pywFM

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM
https://pypi.python.org/pypi/pywFM
MIT License
250 stars 43 forks source link

Cannot produce Test(ll) results locally #5

Closed mpearmain closed 8 years ago

mpearmain commented 8 years ago

Hi,

I've been testing pywFM package and my question involves understanding how the model.prediction links to the information that is produced in the output

My specific example: If i run libFM with a train and test dataset, i can see in the output test(ll) drops to 0.515385, if i take the predictions and run the test predictions against the test label i get logloss values of 8.134375875846, where i should get 0.515385

For clarity please see the thread i started on Kaggle which also enables you to download the data and reproduce the error.

Full example code: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19319/help-with-libfm/110652#post110652

jfloff commented 8 years ago

Sorry for the delay, I've been unable to check github these last couple of days.

Regarding model.prediction and the output: all the output that you see from pywFM is the one that you would see form using libfm. Regarding the variables that pywFM outputs, here is a rundown:

Does this answer your question?

Have you tried run libfm (without the wrapper) to see if the results differ from the ones with pywFM? Which date did you use to get the 8.13 value?

Thank you for the kind words on the Kaggle thread.

mpearmain commented 8 years ago

Thanks for the reply

After writing, I did test using libFM on cli only and had the same problem.

Basically the -out isn't producing predictions that relate to the test(LL) as I would expect.

I'm 100% sure this is a user error on i/o usage as I don't think libFM is broken :)

I'll continue to investigate, FYI. I also opened a thread on this, in the libFM Google group On Thu, 31 Mar 2016 at 11:00, João Ferreira Loff notifications@github.com wrote:

Sorry for the delay, I've been unable to check github these last couple of days.

Regarding model.prediction and the output: all the output that you see from pywFM is the one that you would see form using libfm. Regarding the variables that pywFM outputs, here is a rundown:

  • predictions: taken from the -out file option from libFM. I do some processing just to convert this file into an array
  • global_bias, weights, pairwise_interactions: these 3 are taken from the model file that libfm produces if you pass the -save_model flag (more info here https://github.com/srendle/libfm/commit/19db0d1e36490290dadb530a56a5ae314b68da5d). I do some processing here to split the 3 outputs (given by the same file) into 3 variables.
  • rlog: taken from the csv produced form libfm, and loaded as a pandas df

Does this answer your question?

Have you tried run libfm (without the wrapper) to see if the results differ from the ones with pywFM? Which date did you use to get the 8.13 value?

Thank you for the kind words on the Kaggle thread.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jfloff/pywFM/issues/5#issuecomment-203859876

jfloff commented 8 years ago

I saw that thread on the libFM user group, that's why I asked if you had compared with libFM alone (not with pywFM).

Which data are you using to produce the 8.13 value? Could you post the output from that run? You are saying that you used "test predictions against the test label". Shouldn't you be using train data against predictions?

Remember that each time you run libFM you are running a new model. There is a way to use the same model on a new prediction set, but I haven't done that. Is that what you are looking for?

mpearmain commented 8 years ago

Like a fool, i didnt set the seed in the script so reproduction of the results isnt easy (i need it in the train_test_split of the data). However simply downloading the data and running the script will highlight the problem (even if results are not the same).

For this case i am trying to produce the predictions that give rise to the test(LL) while the model is training. In theory (unless i am very much mistaken) model.prediction is (as you've stated) the same as the -out flag from cli. Therefore as we passed both train and test to libFM the output predictions should be on the test set that was supplied.

So in theory if i run a logloss on the predictions and the labels from the test set i should have the same logloss value as produced in the printed output. This is the crux of the problem, i dont get anything like a close match. (running in standalone mode gives rise to the same problem, i.e its not actually a pywFM issue)

On Thu, 31 Mar 2016 at 11:23 João Ferreira Loff notifications@github.com wrote:

I saw that thread on the libFM user group, that's why I asked if you had compared with libFM alone (not with pywFM).

Which data are you using to produce the 8.13 value? Could you post the output from that run? You are saying that you used "test predictions against the test label". Shouldn't you be using train data against predictions?

Remember that each time you run libFM you are running a new model. There is a way to use the same model on a new prediction set, but I haven't done that. Is that what you are looking for?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jfloff/pywFM/issues/5#issuecomment-203868746

jfloff commented 8 years ago

But the test(LL) are specific for the train/test data you are working with in that moment. From my knowledge, logloss is just an error measure between the predictions and real values. Even if you train a model with 0.5 logloss error, if you then use the same train data but with a test data that might be skewed (say the train data you have is skewed towards false values, and the prediction values you have is skewed towards true values), then you might get a higher logloss value.

Does this help anything at all?

mpearmain commented 8 years ago

You are correct in the way logloss works, you've actually encapsulated the problem im trying to solve in your first sentence:

"the test(LL) are specific for the train/test data you are working with in that moment."

This is exactly what i want to produce, the final output test(LL) from the model that has been built and applied to the test set that was given. i.e running the simple example yields:

Iter= 0 Train=0.666074 Test=0.668503 Test(ll)=0.665911

Iter= 1 Train=0.693502 Test=0.694918 Test(ll)=0.606683

Iter= 2 Train=0.707983 Test=0.693256 Test(ll)=0.570645

Iter= 3 Train=0.730493 Test=0.711274 Test(ll)=0.526459

Iter= 4 Train=0.731135 Test=0.711274 Test(ll)=0.513271

Iter= 5 Train=0.692782 Test=0.714686 Test(ll)=0.515833

Iter= 6 Train=0.702832 Test=0.70419 Test(ll)=0.516339

Iter= 7 Train=0.698818 Test=0.7097 Test(ll)=0.514831

Iter= 8 Train=0.709859 Test=0.707076 Test(ll)=0.515032

Iter= 9 Train=0.714223 Test=0.711624 Test(ll)=0.515385

using the model.prediction via pywFM or -out via cli with libFM should provide me with the predicted probabilities (between 0 and 1, which it does) that the libFM model that was just built has predicted for the test set provided.

It is the final stage of using these probabilities and the label (y_test) that doesnt return the same result (0.515385 in this case)

Does this make sense?

On Thu, 31 Mar 2016 at 12:03 João Ferreira Loff notifications@github.com wrote:

But the test(LL) are specific for the train/test data you are working with in that moment. From my knowledge, logloss is just an error measure between the predictions and real values. Even if you train a model with 0.5 logloss error, if you then use the same train data but with a test data that might be skewed (say the train data you have is skewed towards false values, and the prediction values you have is skewed towards true values), then you might get a higher logloss value.

Does this help anything at all?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jfloff/pywFM/issues/5#issuecomment-203879054

jfloff commented 8 years ago

But are we talking about using the same data (train and test) giving different values? Did you change the train or the test?

mpearmain commented 8 years ago

I've added random_state to get reproducible results for the data (downloadable form kaggle) so you can see the issue.

    import pandas as pd
    import pywFM  # Using the python wrapper https://github.com/jfloff/pywFM
    from sklearn.metrics import log_loss
    from sklearn.cross_validation import train_test_split

    random_seed = 1234

    print('Load data...')
    train = pd.read_csv("./input/train.csv")
    target = train['target'].values
    train = train.drop(['ID', 'target'], axis=1)
    test = pd.read_csv("./input/test.csv")
    id_test = test['ID'].values
    test = test.drop(['ID'], axis=1)

    print('Clearing...')
    for (train_name, train_series), (test_name, test_series) in
zip(train.iteritems(), test.iteritems()):
        if train_series.dtype == 'O':
            # for objects: factorize
            train[train_name], tmp_indexer = pd.factorize(train[train_name])
            test[test_name] = tmp_indexer.get_indexer(test[test_name])
            # but now we have -1 values (NaN)
        else:
            # for int or float: fill NaN
            tmp_len = len(train[train_series.isnull()])
            if tmp_len > 0:
                # print "mean", train_series.mean()
                train.loc[train_series.isnull(), train_name] = -9999
                # and Test
            tmp_len = len(test[test_series.isnull()])
            if tmp_len > 0:
                test.loc[test_series.isnull(), test_name] = -9999

    xtrain, xtest, ytrain, ytest = train_test_split(train, target,
train_size=0.9, random_state=1234)

    clf = pywFM.FM(task='classification',
                   num_iter=10,
                   init_stdev=0.1,
                   k2=5,
                   learning_method='mcmc',
                   verbose=False,
                   silent=False)

    model = clf.run(x_train=xtrain, y_train=ytrain, x_test=xtest,
y_test=ytest)
    log_loss(ytest, model.predictions, eps=1e-15)

This should give the output:

Loading train... has x = 0 has xt = 1 num_rows=102888 num_values=12582078 num_features=131 min_target=0 max_target=1 Loading test... has x = 0 has xt = 1 num_rows=11433 num_values=1397626 num_features=131 min_target=0 max_target=1

relations: 0

Loading meta data... logging to /var/folders/44/q92fcr8n26gc377b_x_4g85m0000gp/T/tmp_jMdqT

Iter= 0 Train=0.750719 Test=0.755795 Test(ll)=0.49171

Iter= 1 Train=0.749942 Test=0.75597 Test(ll)=0.491415

Iter= 2 Train=0.731115 Test=0.75492 Test(ll)=0.484581

Iter= 3 Train=0.745646 Test=0.75597 Test(ll)=0.4842

Iter= 4 Train=0.726314 Test=0.750634 Test(ll)=0.477684

Iter= 5 Train=0.717489 Test=0.750809 Test(ll)=0.474122

Iter= 6 Train=0.70846 Test=0.743112 Test(ll)=0.469061

Iter= 7 Train=0.716293 Test=0.745561 Test(ll)=0.464739

Iter= 8 Train=0.706069 Test=0.738476 Test(ll)=0.46401

Iter= 9 Train=0.728258 Test=0.740488 Test(ll)=0.464135

Writing FM model...

Out[59]:logloss = 7.3850759798108481

So in this reproducible example 0.46 != 7.385

On Thu, 31 Mar 2016 at 13:39 João Ferreira Loff notifications@github.com wrote:

But are we talking about using the same data (train and test) giving different values? Did you change the train or the test?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jfloff/pywFM/issues/5#issuecomment-203913335

mpearmain commented 8 years ago

I guess the key part here is why the line log_loss(ytest, model.predictions, eps=1e-15) doesnt equal the final Test(ll)

jfloff commented 8 years ago

I haven't used sklearn's log_loss, but from documentation, it appears that the predictions needs to be in a specific format:

y_pred : array-like of float, shape = (n_samples, n_classes)
              Predicted probabilities, as returned by a classifier’s predict_proba method.
(...)
>>> log_loss(["spam", "ham", "ham", "spam"], [[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
0.21616...
mpearmain commented 8 years ago

Yes, thats correct.

If we pretend there is only one class we can get the same result, akin to the logloss im using in the example above.

`In[66]: log_loss(["spam", "ham", "ham", "spam"], [[ .9], [.1], [.2], [.65]])

Out[66]: 0.21616187468057912`

jfloff commented 8 years ago

I guess that this also yields the same result: log_loss(["spam", "ham", "ham", "spam"], [.9, .1, .2, .65]) ?

I guess its either how libfm is doing logloss or sklearn? Have you tried computing logloss manually (actually implementing yourself the function, or doing in pen&paper) to compare results? I.e. what actually is the correct result from the two?

mpearmain commented 8 years ago

Yes, It'd fairly trivial to implement.

import scipy as sp
def logloss(act, pred):
    epsilon = 1e-15
    pred = sp.maximum(epsilon, pred)
    pred = sp.minimum(1-epsilon, pred)
    ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred)))
    ll = ll * -1.0/len(act)
    return ll

OK lets close this issue, as it's clearly something with libFM that im not doing correctly to get the same result.

Thanks very much for you time looking at this

jfloff commented 8 years ago

Feel free to chat if you want someone to discuss that issue with :)

erlendd commented 8 years ago

I'm experiencing the same problem: log_loss reported by libFM is incorrect.

jfloff commented 8 years ago

But it's a libFM problem and not from pywFM correct? Are you getting the same output from libFM (without python wrapper)?

erlendd commented 8 years ago

Yes I wrote about it on the libfm github. They're using log10 not log to compute the loss.

jfloff commented 8 years ago

Ok great! Could you link that issue here for future reference?

erlendd commented 8 years ago

https://github.com/srendle/libfm/issues/21