LibRecommender

Overview

LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation process. It contains a training(libreco) and serving(libserving) module to let users quickly train and deploy different kinds of recommendation models.

The main features are:

Implements a number of popular recommendation algorithms such as FM, DIN, LightGCN etc. See full algorithm list.
A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features. New features can be added on the fly.
Low memory usage, automatically converts categorical and multi-value categorical features to sparse representation.
Supports training for both explicit and implicit datasets, as well as negative sampling on implicit data.
Provides end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> save/load -> serving.
Supports cold-start prediction and recommendation.
Supports dynamic feature and sequence recommendation.
Provides unified and friendly API for all algorithms.
Easy to retrain model with new users/items from new data.

Usage

pure collaborative-filtering example :

import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import LightGCN  # pure data, algorithm LightGCN
from libreco.evaluation import evaluate

data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
                   names=["user", "item", "label", "time"])

# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info)  # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %

lightgcn = LightGCN(
    task="ranking",
    data_info=data_info,
    loss_type="bpr",
    embed_size=16,
    n_epochs=3,
    lr=1e-3,
    batch_size=2048,
    num_neg=1,
    device="cuda",
)
# monitor metrics on eval data during training
lightgcn.fit(
    train_data,
    neg_sampling=True,
    verbose=2,
    eval_data=eval_data,
    metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)

# do final evaluation on test data
evaluate(
    model=lightgcn,
    data=test_data,
    neg_sampling=True,
    metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)

# predict preference of user 2211 to item 110
lightgcn.predict(user=2211, item=110)
# recommend 7 items for user 2211
lightgcn.recommend_user(user=2211, n_rec=7)

# cold-start prediction
lightgcn.predict(user="ccc", item="not item", cold_start="average")
# cold-start recommendation
lightgcn.recommend_user(user="are we good?", n_rec=7, cold_start="popular")

include features example :

import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking  # feat data, algorithm YouTubeRanking

data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)

# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]

train_data, data_info = DatasetFeat.build_trainset(
    train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
print(data_info)  # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %

ytb_ranking = YouTubeRanking(
    task="ranking",
    data_info=data_info,
    embed_size=16,
    n_epochs=3,
    lr=1e-4,
    batch_size=512,
    use_bn=True,
    hidden_units=(128, 64, 32),
)
ytb_ranking.fit(
    train_data,
    neg_sampling=True,
    verbose=2,
    shuffle=True,
    eval_data=test_data,
    metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"],
)

# predict preference of user 2211 to item 110
ytb_ranking.predict(user=2211, item=110)
# recommend 7 items for user 2211
ytb_ranking.recommend_user(user=2211, n_rec=7)

# cold-start prediction
ytb_ranking.predict(user="ccc", item="not item", cold_start="average")
# cold-start recommendation
ytb_ranking.recommend_user(user="are we good?", n_rec=7, cold_start="popular")

Data Format

JUST normal data format, each line represents a sample. One thing is important, the model assumes that user, item, and label column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m dataset:

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275

Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col, dense_col, user_col, item_col], which means all features must be in a same table. See above YouTubeRanking for example.

Also note that your data should not contain missing values.

Documentation

The tutorials and API documentation are hosted on librecommender.readthedocs.io.

The example scripts are under examples/ folder.

Installation & Dependencies

From pypi :

$ pip install -U LibRecommender

Build from source:

$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ pip install .

Basic Dependencies for `libreco`:

Python >= 3.6
TensorFlow >= 1.15, < 2.16
PyTorch >= 1.10
Numpy >= 1.19.5
Pandas >= 1.0.0
Scipy >= 1.2.1, < 1.13.0
scikit-learn >= 0.20.0
gensim >= 4.0.0
tqdm
nmslib (optional, used in approximate similarity searching. See Embedding)
DGL (optional, used in GraphSage and PinSage. See Implementation Details)
Cython >= 0.29.0, < 3 (optional, for building from source)

If you are using Python 3.6, you also need to install dataclasses, which was first introduced in Python 3.7.

LibRecommender has been tested under TensorFlow 1.15, 2.6, 2.10 and 2.12. If you encounter any problem during running, feel free to open an issue.

Tensorflow 2.16 starts using Keras 3.0, so tf1 syntax is no longer supported. Now the supported version is 1.15 - 2.15.

Known issue:

Sometimes one may encounter errors like ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.
When saving a TensorFlow model for serving, you might encounter the error message: Fatal Python error: Segmentation fault (core dumped). This issue is most likely related to the protobuf library, so you should follow the official recommended version based on your local tensorflow version. In general, it's advisable to use protobuf < 4.24.0.

The table below shows some compatible version combinations:

Python	Numpy	TensorFlow	OS
3.6	1.19.5	1.15, 2.5	linux, windows, macos
3.7	1.20.3, 1.21.6	1.15, 2.6, 2.10	linux, windows, macos
3.8	1.22.4, 1.23.4	2.6, 2.10, 2.12	linux, windows, macos
3.9	1.22.4, 1.23.4	2.6, 2.10, 2.12	linux, windows, macos
3.10	1.22.4, 1.23.4, 1.24.2	2.10, 2.12	linux, windows, macos
3.11	1.23.4, 1.24.2	2.12	linux, windows, macos

Optional Dependencies for `libserving`:

Python >= 3.7
sanic >= 22.3
requests
aiohttp
pydantic
ujson
redis
redis-py >= 4.2.0
faiss >= 1.5.2
TensorFlow Serving == 2.8.2

Docker

One can also use the library in a docker container without installing dependencies, see Docker.

References

Algorithm	Category¹	Backend	Sequence²	Graph³	Embedding⁴	Paper
userCF / itemCF	pure	Cython, Rust				Item-Based Collaborative Filtering
SVD	pure	TensorFlow1			:heavy_check_mark:	Matrix Factorization Techniques
SVD++	pure	TensorFlow1			:heavy_check_mark:	Factorization Meets the Neighborhood
ALS	pure	Cython			:heavy_check_mark:	1. Matrix Completion via Alternating Least Square(ALS) 2. Collaborative Filtering for Implicit Feedback Datasets 3. Conjugate Gradient for Implicit Feedback
NCF	pure	TensorFlow1				Neural Collaborative Filtering
BPR	pure	Cython, TensorFlow1			:heavy_check_mark:	Bayesian Personalized Ranking
Wide & Deep	feat	TensorFlow1				Wide & Deep Learning for Recommender Systems
FM	feat	TensorFlow1				Factorization Machines
DeepFM	feat	TensorFlow1				DeepFM
YouTubeRetrieval	feat	TensorFlow1	:heavy_check_mark:		:heavy_check_mark:	Deep Neural Networks for YouTube Recommendations
YouTubeRanking	feat	TensorFlow1	:heavy_check_mark:			Deep Neural Networks for YouTube Recommendations
AutoInt	feat	TensorFlow1				AutoInt
DIN	feat	TensorFlow1	:heavy_check_mark:			Deep Interest Network
Item2Vec	pure	/	:heavy_check_mark:		:heavy_check_mark:	Item2Vec
RNN4Rec / GRU4Rec	pure	TensorFlow1	:heavy_check_mark:		:heavy_check_mark:	Session-based Recommendations with Recurrent Neural Networks
Caser	pure	TensorFlow1	:heavy_check_mark:		:heavy_check_mark:	Personalized Top-N Sequential Recommendation via Convolutional
WaveNet	pure	TensorFlow1	:heavy_check_mark:		:heavy_check_mark:	WaveNet: A Generative Model for Raw Audio
DeepWalk	pure	/		:heavy_check_mark:	:heavy_check_mark:	DeepWalk
NGCF	pure	PyTorch		:heavy_check_mark:	:heavy_check_mark:	Neural Graph Collaborative Filtering
LightGCN	pure	PyTorch		:heavy_check_mark:	:heavy_check_mark:	LightGCN
GraphSage	feat	DGL, PyTorch		:heavy_check_mark:	:heavy_check_mark:	Inductive Representation Learning on Large Graphs
PinSage	feat	DGL, PyTorch		:heavy_check_mark:	:heavy_check_mark:	Graph Convolutional Neural Networks for Web-Scale
TwoTower	feat	TensorFlow1			:heavy_check_mark:	1. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item 2. Self-supervised Learning for Large-scale Item
Transformer	feat	TensorFlow1	:heavy_check_mark:			1. BST 2. Transformers4Rec 3. RMSNorm
SIM	feat	TensorFlow1	:heavy_check_mark:			SIM
Swing	pure	Rust				Swing

^{[1] Category: pure means collaborative-filtering algorithms which only use behavior data, feat means other side-features can be included. ↩}

^{[2] Sequence: Algorithms that leverage user behavior sequence. ↩}

^{[3] Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) . ↩}

^{[4] Embedding: Algorithms that can generate final user and item embeddings. ↩}

massquantity / LibRecommender

readme

LibRecommender

Overview

Usage

pure collaborative-filtering example :

include features example :

Data Format

Documentation

Installation & Dependencies

Basic Dependencies for `libreco`:

Optional Dependencies for `libserving`:

Docker

References

Powered by

massquantity / LibRecommender

readme

LibRecommender

Overview

Usage

pure collaborative-filtering example :

include features example :

Data Format

Documentation

Installation & Dependencies

Basic Dependencies for libreco:

Optional Dependencies for libserving:

Docker

References

Powered by

Basic Dependencies for `libreco`:

Optional Dependencies for `libserving`: