ibayer / fastFM

fastFM: A Library for Factorization Machines
http://ibayer.github.io/fastFM
Other
1.07k stars 206 forks source link

fastFM has bad performance on classification with SGD #69

Closed yxzf closed 7 years ago

yxzf commented 8 years ago

I test fastFM with some datasets, the performance is really bad, compared with libfm, lr, gbdt. The prediction=0.76, recall=0.55, while other methods all give prediction=1, recall = 1 Do I use it wrong?

[Uploading agaricus.txt…]()

train_file = '../data/agaricus.txt.train'
    test_file = '../data/agaricus.txt.test'
    X_train, y_train, X_test, y_test = read_data(train_file, test_file)

    y_train = transform_label(y_train)
    y_test = transform_label(y_test)
    n_iter = 50

    clf = sgd.FMClassification(n_iter=1000, init_stdev=0.1, rank=5,
                 l2_reg_w=0, l2_reg_V=0, l2_reg=None, step_size=0.01)
    clf.fit(X_train, y_train)
    y_predict  = clf.predict(X_test)
    print classification_report(y_test, y_predict)
    print clf.predict_proba(X_test)
yxzf commented 8 years ago

The datasets are in : https://github.com/dmlc/xgboost/blob/master/demo/data/agaricus.txt.train https://github.com/dmlc/xgboost/blob/master/demo/data/agaricus.txt.test

ibayer commented 8 years ago
  1. Why do you use sgd?
  2. If you compare again libfm make sure you use the same hyperparameter. l2_reg_w=0, l2_reg_V=0 are usually very bad choices.
yxzf commented 8 years ago

@ibayer 1.Just for test. Do you mean sgd don't work in fastFM? Even sgd in libfm can have a not bad result.

  1. The same as libfm. And the result gap between fastFM and libfm in SGD is so big.
ibayer commented 8 years ago

1.Just for test. Do you mean sgd don't work in fastFM?

No

  1. The same as libfm. And the result gap between fastFM and libfm in SGD is so big.

You need to use the same parameter to compare implementations (incl. stepsize) lifbfm might have better default settings then fastFM in certain situation.