nokaut / wsknn

Session-weighted recommendation system in Python
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

General question on input data format #36

Closed inpefess closed 1 year ago

inpefess commented 1 year ago

The fit method expects two maps (sessions2items and items2sessions), but we can always construct the second argument from the first, can't we? What is the purpose of such data duplication? Why can't it be part of the fit method (and as a consequence, why does one have to store such datasets)? It might be worth mentioning in the JOSS paper (there is only the first argument now mentioned in the draft).

SimonMolinsky commented 1 year ago

You are correct that the item-sessions map can be derived from the session-items map. The reason why both maps are provided is related to the data preprocessing step, where both maps are derived in a single function to avoid computational repetitions.

In normal circumstances, the user may provide the session-items map only, and I can leave the item-sessions map as an optional parameter. I will add this functionality to the fit() method.

SimonMolinsky commented 1 year ago

Ok, it's done. Now user can store only session-items mapping (or raw session events).