LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation process. It contains a training(libreco) and serving(libserving) module to let users quickly train and deploy different kinds of recommendation models.
The main features are:
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import LightGCN # pure data, algorithm LightGCN
from libreco.evaluation import evaluate
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
lightgcn = LightGCN(
task="ranking",
data_info=data_info,
loss_type="bpr",
embed_size=16,
n_epochs=3,
lr=1e-3,
batch_size=2048,
num_neg=1,
device="cuda",
)
# monitor metrics on eval data during training
lightgcn.fit(
train_data,
neg_sampling=True,
verbose=2,
eval_data=eval_data,
metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)
# do final evaluation on test data
evaluate(
model=lightgcn,
data=test_data,
neg_sampling=True,
metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)
# predict preference of user 2211 to item 110
lightgcn.predict(user=2211, item=110)
# recommend 7 items for user 2211
lightgcn.recommend_user(user=2211, n_rec=7)
# cold-start prediction
lightgcn.predict(user="ccc", item="not item", cold_start="average")
# cold-start recommendation
lightgcn.recommend_user(user="are we good?", n_rec=7, cold_start="popular")
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(
task="ranking",
data_info=data_info,
embed_size=16,
n_epochs=3,
lr=1e-4,
batch_size=512,
use_bn=True,
hidden_units=(128, 64, 32),
)
ytb_ranking.fit(
train_data,
neg_sampling=True,
verbose=2,
shuffle=True,
eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"],
)
# predict preference of user 2211 to item 110
ytb_ranking.predict(user=2211, item=110)
# recommend 7 items for user 2211
ytb_ranking.recommend_user(user=2211, n_rec=7)
# cold-start prediction
ytb_ranking.predict(user="ccc", item="not item", cold_start="average")
# cold-start recommendation
ytb_ranking.recommend_user(user="are we good?", n_rec=7, cold_start="popular")
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col
, dense_col
, user_col
, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Also note that your data should not contain missing values.
The tutorials and API documentation are hosted on librecommender.readthedocs.io.
The example scripts are under examples/ folder.
From pypi :
$ pip install -U LibRecommender
Build from source:
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ pip install .
libreco
:If you are using Python 3.6, you also need to install dataclasses, which was first introduced in Python 3.7.
LibRecommender has been tested under TensorFlow 1.15, 2.6, 2.10 and 2.12. If you encounter any problem during running, feel free to open an issue.
Tensorflow 2.16 starts using Keras 3.0, so tf1 syntax is no longer supported. Now the supported version is 1.15 - 2.15.
Known issue:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.Fatal Python error: Segmentation fault (core dumped)
.
This issue is most likely related to the protobuf
library, so you should follow the official recommended version
based on your local tensorflow version. In general, it's advisable to use protobuf < 4.24.0.The table below shows some compatible version combinations:
Python | Numpy | TensorFlow | OS |
---|---|---|---|
3.6 | 1.19.5 | 1.15, 2.5 | linux, windows, macos |
3.7 | 1.20.3, 1.21.6 | 1.15, 2.6, 2.10 | linux, windows, macos |
3.8 | 1.22.4, 1.23.4 | 2.6, 2.10, 2.12 | linux, windows, macos |
3.9 | 1.22.4, 1.23.4 | 2.6, 2.10, 2.12 | linux, windows, macos |
3.10 | 1.22.4, 1.23.4, 1.24.2 | 2.10, 2.12 | linux, windows, macos |
3.11 | 1.23.4, 1.24.2 | 2.12 | linux, windows, macos |
libserving
:One can also use the library in a docker container without installing dependencies, see Docker.
Algorithm | Category1 | Backend | Sequence2 | Graph3 | Embedding4 | Paper |
---|---|---|---|---|---|---|
userCF / itemCF | pure | Cython, Rust | Item-Based Collaborative Filtering | |||
SVD | pure | TensorFlow1 | :heavy_check_mark: | Matrix Factorization Techniques | ||
SVD++ | pure | TensorFlow1 | :heavy_check_mark: | Factorization Meets the Neighborhood | ||
ALS | pure | Cython | :heavy_check_mark: | 1. Matrix Completion via Alternating Least Square(ALS) 2. Collaborative Filtering for Implicit Feedback Datasets 3. Conjugate Gradient for Implicit Feedback |
||
NCF | pure | TensorFlow1 | Neural Collaborative Filtering | |||
BPR | pure | Cython, TensorFlow1 | :heavy_check_mark: | Bayesian Personalized Ranking | ||
Wide & Deep | feat | TensorFlow1 | Wide & Deep Learning for Recommender Systems | |||
FM | feat | TensorFlow1 | Factorization Machines | |||
DeepFM | feat | TensorFlow1 | DeepFM | |||
YouTubeRetrieval | feat | TensorFlow1 | :heavy_check_mark: | :heavy_check_mark: | Deep Neural Networks for YouTube Recommendations | |
YouTubeRanking | feat | TensorFlow1 | :heavy_check_mark: | Deep Neural Networks for YouTube Recommendations | ||
AutoInt | feat | TensorFlow1 | AutoInt | |||
DIN | feat | TensorFlow1 | :heavy_check_mark: | Deep Interest Network | ||
Item2Vec | pure | / | :heavy_check_mark: | :heavy_check_mark: | Item2Vec | |
RNN4Rec / GRU4Rec | pure | TensorFlow1 | :heavy_check_mark: | :heavy_check_mark: | Session-based Recommendations with Recurrent Neural Networks | |
Caser | pure | TensorFlow1 | :heavy_check_mark: | :heavy_check_mark: | Personalized Top-N Sequential Recommendation via Convolutional | |
WaveNet | pure | TensorFlow1 | :heavy_check_mark: | :heavy_check_mark: | WaveNet: A Generative Model for Raw Audio | |
DeepWalk | pure | / | :heavy_check_mark: | :heavy_check_mark: | DeepWalk | |
NGCF | pure | PyTorch | :heavy_check_mark: | :heavy_check_mark: | Neural Graph Collaborative Filtering | |
LightGCN | pure | PyTorch | :heavy_check_mark: | :heavy_check_mark: | LightGCN | |
GraphSage | feat | DGL, PyTorch | :heavy_check_mark: | :heavy_check_mark: | Inductive Representation Learning on Large Graphs | |
PinSage | feat | DGL, PyTorch | :heavy_check_mark: | :heavy_check_mark: | Graph Convolutional Neural Networks for Web-Scale | |
TwoTower | feat | TensorFlow1 | :heavy_check_mark: | 1. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item 2. Self-supervised Learning for Large-scale Item |
||
Transformer | feat | TensorFlow1 | :heavy_check_mark: | 1. BST 2. Transformers4Rec 3. RMSNorm |
||
SIM | feat | TensorFlow1 | :heavy_check_mark: | SIM | ||
Swing | pure | Rust | Swing |
[1] Category:
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other side-features can be included. ↩[2] Sequence: Algorithms that leverage user behavior sequence. ↩
[3] Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) . ↩
[4] Embedding: Algorithms that can generate final user and item embeddings. ↩