Do you build a recommender system for your website? K-nearest neighbors algorithm is a good choice if you are looking for a simple, fast, and explainable solution. Weighted-session-based k-nn recommendations are close to the state-of-the-art, and we don't need to tune multiple hyperparameters and build complex deep learning models to achieve a good result.
API Documentation is available here: WSKNN Docs
You provide two input structures as training data:
sessions : dict
sessions = {
session id: (
[sequence of items with user interaction],
[timestamp of user interaction per item],
[(optional) sequence of event names],
[(optional) sequence of weights]
)
}
items : dict
items = {
item id: (
[sequence of sessions with an item],
[the first timestamp of each session with an item]
)
}
And you ask a model to recommend products based on the user session:
user session:
{session id:
[[sequence of items], [sequence of timestamps], [optional event names], [optional weights]]
}
The package is lightweight. It depends only on the numpy
and pyyaml
.
Moreover, we can provide a package for non-programmers, and they can use settings.yaml
to control a model behavior.
The model was created along with multiple other approaches: based on RNN (GRU/LSTM), matrix factorization, and others. Its performance was always very close to the level of fine-tuned neural networks, but it was much easier and faster to train.
Example below is available in demo-notebooks/demo-readme.ipynb
notebook.
import numpy as np
from wsknn import fit
from wsknn.utils import load_gzipped_pickle
# Load data
ITEMS = 'demo-data/recsys-2015/parsed_items.pkl.gz'
SESSIONS = 'demo-data/recsys-2015/parsed_sessions.pkl.gz'
items = load_gzipped_pickle(ITEMS)
sessions = load_gzipped_pickle(SESSIONS)
imap = items['map']
smap = sessions['map']
# Train model
trained_model = fit(smap,
imap,
number_of_recommendations=5,
weighting_func='log',
return_events_from_session=False)
# Get sample session
test_session_key = np.random.choice(list(smap.keys()))
test_session = smap[test_session_key]
print(test_session) # [products], [timestamps]
[[214850771, 214677615, 214651777], [1407592501.048, 1407592529.941, 1407592552.98]]
recommendations = trained_model.recommend(test_session)
for rec in recommendations:
print('Item:', rec[0], '| weight:', rec[1])
Output recommendations
Item: 214676306 | weight: 1.8718411072574241
Item: 214850758 | weight: 1.2478940715049494
Item: 214561775 | weight: 1.2478940715049494
Item: 214821020 | weight: 1.2478940715049494
Item: 214848322 | weight: 1.2478940715049494
Version 1.x of a package can be installed with pip
:
pip install wsknn
It works with Python versions greater or equal to 3.8.
Package Version | Python versions | Requirements |
---|---|---|
0.1.x | 3.6+ | numpy, pyyaml |
1.1.x | 3.8+ | numpy, more_itertools, pyyaml |
1.2.x | 3.8+ | numpy, more_itertools, pandas, pyyaml, tqdm |
We welcome all submissions, issues, feature requests, and bug reports! To learn how to contribute to the package please visit CONTRIBUTION.md file
Moliński, S., (2023). WSKNN - Weighted Session-based K-NN recommender system. Journal of Open Source Software, 8(90), 5639, https://doi.org/10.21105/joss.05639
The article compares performance of mutiple session-based recommender systems.
POIR.01.01.01-00-0632/18
As a rule of thumb you should assume that you should have ~2 times more memory available than your model's memory size
All performance characterists were derived in this notebook, and you can use it for your own performance tests.