Closed breuderink closed 8 years ago
libFM uses a time dependent seed for the random initialization by default.
"seed", "integer value, default=None"
https://github.com/srendle/libfm/blob/master/src/libfm/libfm.cpp#L93
I think the results between runs should match if you set a seed.
Using the same seed indeed prevents differences between runs. But what I try to report here is that the per-iteration training set and test set 'performance' differs, although I supplied the same data for both sets. I.e. in the snippet above, the train performance for iteration 99 is 0.52756, while the test performance on the same data is 0.530803. If I understand correctly, these numbers should be equal since the input data is equal.
This is based on my assumption that they are produced by computing some performance metric (like fraction correctly classified) on the predictions of the model (with parameters from that iteration), using either the training set and the validation set as input. But that assumption might be wrong.
Can you check if this is also true with the option --method=ALS
?
Yes. With libFM -task c -train train.libfm -test train.libfm -method als
there still is a small difference between the train and test scores.
How small is the difference compared to the difference with MCMC? Is it it plausible that's just a small numerical error? Which error is correct (train or test)? You can use the last error and compare it against what you get when calculating the error yourself.
I generated some artificial data with this Python script:
import random
with open('train.libfm', 'w') as f:
for i in range(1000):
# Write class.
if i % 2 == 0:
f.write('0')
else:
f.write('1')
for j in range(100):
f.write(' %d:%f' % (j, random.normalvariate(0, 1)))
f.write('\n')
It generates alternating target labels, with 100 dense random features. The output looks like this:
...
#Iter= 97 Train=0.925 Test=0.997 Test(ll)=0.0801822
#Iter= 98 Train=0.913 Test=0.997 Test(ll)=0.0798717
#Iter= 99 Train=0.919 Test=0.997 Test(ll)=0.079558
It seems that it is overfitting, because the features are not informative. The difference is now relatively big. I have saved the output with the --out
flag, and the results reported for Test=
correspond to the accuracy calculated manually. So that part seems right. What could have caused the Train=
score to deviate so much?
I think that the test score is calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L243, while the train score is mainly calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L170-L172. The code path seems indeed different. So, what happens in the code path that computes the accuracy for the training set?
libFM uses a few tricks like clipping prediction to highest / lowest vales. Maybe one of this tricks in only applied to the test predictions.
The printed train accuracy is calculated for one MCMC draw. The test accuracy over all draws (i.e., an average). I agree that this is misleading and both measures should report either the average or one draw.
In general, I would recommend to look at the log-file and not at std::out. The log file is more verbose and reports all test-values: one draw, all draws, all but 5 draws. It contains loglikelihood and accuracy for these measures.
Thanks for the elaboration. I'll take a look at the log file to see if I understand it.
where can download train and test data?I can only find movie,rating,user,tags data on movielens.
I was testing libFM, and one of my tests involved running libFM with the same train and test dataset:
This seems to work, but the intermediate performance values are different for the train and test set, while the data comes from the same file:
I would expect that the train and test performance are exactly the same. Is this an indication of a bug? Or do I misunderstand what is being logged here?