nokaut / wsknn

Session-weighted recommendation system in Python
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link
e-commerce knn machine-learning recommendation-engine recommender-system vsknn

WSKNN: k-NN recommender for session-based data

DOI Python repostatus

Weighted session-based k-NN - Intro

Do you build a recommender system for your website? K-nearest neighbors algorithm is a good choice if you are looking for a simple, fast, and explainable solution. Weighted-session-based k-nn recommendations are close to the state-of-the-art, and we don't need to tune multiple hyperparameters and build complex deep learning models to achieve a good result.

Documentation

API Documentation is available here: WSKNN Docs

How does it work?

You provide two input structures as training data:

sessions : dict
               sessions = {
                   session id: (
                       [sequence of items with user interaction],
                       [timestamp of user interaction per item],
                       [(optional) sequence of event names],
                       [(optional) sequence of weights]
                   )
               }

items : dict
        items = {
            item id: (
                [sequence of sessions with an item],
                [the first timestamp of each session with an item]
            )
        }

And you ask a model to recommend products based on the user session:

user session: 
    {session id:
        [[sequence of items], [sequence of timestamps], [optional event names], [optional weights]]
    }

The package is lightweight. It depends only on the numpy and pyyaml.

Moreover, we can provide a package for non-programmers, and they can use settings.yaml to control a model behavior.

Why should we use WSKNN?

The model was created along with multiple other approaches: based on RNN (GRU/LSTM), matrix factorization, and others. Its performance was always very close to the level of fine-tuned neural networks, but it was much easier and faster to train.

What are the limitations of WSKNN?

Example

Example below is available in demo-notebooks/demo-readme.ipynb notebook.

import numpy as np
from wsknn import fit
from wsknn.utils import load_gzipped_pickle

# Load data
ITEMS = 'demo-data/recsys-2015/parsed_items.pkl.gz'
SESSIONS = 'demo-data/recsys-2015/parsed_sessions.pkl.gz'

items = load_gzipped_pickle(ITEMS)
sessions = load_gzipped_pickle(SESSIONS)
imap = items['map']
smap = sessions['map']

# Train model
trained_model = fit(smap,
                    imap,
                    number_of_recommendations=5,
                    weighting_func='log',
                    return_events_from_session=False)

# Get sample session
test_session_key = np.random.choice(list(smap.keys()))
test_session = smap[test_session_key]
print(test_session)  # [products], [timestamps]
[[214850771, 214677615, 214651777], [1407592501.048, 1407592529.941, 1407592552.98]]

recommendations = trained_model.recommend(test_session)
for rec in recommendations:
    print('Item:', rec[0], '| weight:', rec[1])

Output recommendations

Item: 214676306 | weight: 1.8718411072574241
Item: 214850758 | weight: 1.2478940715049494
Item: 214561775 | weight: 1.2478940715049494
Item: 214821020 | weight: 1.2478940715049494
Item: 214848322 | weight: 1.2478940715049494

Setup

Version 1.x of a package can be installed with pip:

pip install wsknn

It works with Python versions greater or equal to 3.8.

Requirements

Package Version Python versions Requirements
0.1.x 3.6+ numpy, pyyaml
1.1.x 3.8+ numpy, more_itertools, pyyaml
1.2.x 3.8+ numpy, more_itertools, pandas, pyyaml, tqdm

Contribution

We welcome all submissions, issues, feature requests, and bug reports! To learn how to contribute to the package please visit CONTRIBUTION.md file

Developers

Citation

Moliński, S., (2023). WSKNN - Weighted Session-based K-NN recommender system. Journal of Open Source Software, 8(90), 5639, https://doi.org/10.21105/joss.05639

Bibliography

Data used in a demo example

Comparison between DL and WSKNN

SKNN performance

The article compares performance of mutiple session-based recommender systems.

Funding

Funding

Computational Performance

As a rule of thumb you should assume that you should have ~2 times more memory available than your model's memory size

All performance characterists were derived in this notebook, and you can use it for your own performance tests.

Training time in relation to session length vs number of items

Training time in relation to Session length vs number of items

Total response time for 1000 requests in relation to session length vs number of items

Total response time for 1000 requests in relation to session length vs number of items

Model size in relation to session length vs number of items

Model size in relation to session length vs number of items

Relation between training time and increasing number of items

Relation between training time and increasing number of items

Relation between response time and increasing number of items (for 1000 requests)

Relation between response time and increasing number of items

Relation between training time and increasing number of sessions

Relation between training time and increasing number of sessions

Relation between response time and increasing number of sessions (for 1000 requests)

Relation between response time and increasing number of sessions