quora / qmf

A fast and scalable C++ library for implicit-feedback matrix factorization models
Apache License 2.0
462 stars 96 forks source link

QMF - a matrix factorization library

Build Status Hex.pm


QMF is a fast and scalable C++ library for implicit-feedback matrix factorization models. The current implementation supports two main algorithms:

For evaluation, QMF supports various ranking-based metrics that are computed per-user on test data, in addition to training or test objective values.

For more information, see our blog post about QMF here: https://engineering.quora.com/Open-sourcing-QMF-for-matrix-factorization.

Building QMF

QMF requires gcc 5.0+, as it uses the C++14 standard, and CMake version 2.8+. It also depends on glog, gflags and lapack libraries.


To install libraries dependencies:

sudo apt-get install libgoogle-glog-dev libgflags-dev liblapack-dev

To build the binaries:

cmake .

To run tests:

make test

Output binaries will be under the bin/ folder.


Here's a basic example of usage:

# to train a WALS model
./wals \
    --train_dataset=<train_dataset> \
    --test_dataset=<test_dataset> \
    --user_factors=<user_factors_file> \
    --item_factors=<item_factors_file> \
    --regularization_lambda=0.05 \
    --confidence_weight=40 \
    --nepochs=10 \
    --nfactors=30 \

# to train a BPR model
./bpr \
    --train_dataset=<train_dataset> \
    --test_dataset=<test_dataset> \
    --user_factors=<user_factors_file> \
    --item_factors=<item_factors_file> \
    --nepochs=10 \
    --nfactors=30 \
    --num_hogwild_threads=4 \

The input dataset files should adhere to the following format:

<user_id1> <item_id1> <weight1>
<user_id2> <item_id2> <weight2>

where weight is always 1 in BPR, but can be any integer in WALS (r_ui in the paper [1]).

The output files will be in the following format:

<{user|item}_id> [<bias>] <factor_0> <factor_1> ... <factor_k-1>

where the bias term will only be present for BPR item factors when the --use_biases option is specified.

In order to compute test ranking metrics (averaged per-user), you can add the following parameters to either binary:

In the case of BPR, a set of (user, positive item, negative item) triplets is sampled during initialization for both training and test sets (with a fixed seed, or as given by --eval_seed), and is used to compute an estimate of the loss after each epoch. This has no effect on training or on the computation of ranking metrics.

Options for WALS:

Options for BPR:

For more details on the command-line options, see the definitions in wals.cpp and bpr.cpp.


This library was built at Quora by Denis Yarats and Alberto Bietti.


QMF is released under the Apache 2.0 Licence.


[1] Hu, Koren and Volinsky. Collaborative Filtering for Implicit Feedback Datasets. In ICDM 2008.

[2] Rendle, Freudenthaler, Gantner and Schmidt-Thieme. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009.

[3] Niu, Recht, Ré and Wright. Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS 2011.