An example of loading tabular data and a bug in the `parse_files` function

nokaut / wsknn

Session-weighted recommendation system in Python

BSD 3-Clause "New" or "Revised" License

6 stars 0 forks source link

An example of loading tabular data and a bug in the `parse_files` function #37

Closed inpefess closed 1 year ago

inpefess commented 1 year ago

Several popular papers on sequential recommenders (e.g. BERT4Rec and SASRec) rely on tabular data (MovieLens, Amazon, Steam). I tried to run WSKNN on MovieLens 25M and failed to apply the parse_files function. The argument allowed_actions is None by default, so I didn't pass it, but then inside the function presumes it's not-None and fails. It wouldn't be unreasonable to assume that all the actions appearing in the action_key field are allowed by default and work with the omitted allowed_actions dictionary gracefully. And, of course, the usage example with a popular open dataset instead of the package-specific one might make it even more user-friendly.

SimonMolinsky commented 1 year ago

Thanks, this bug seems to be critical. I will perform some tests on the tabular datasets mentioned by you too, and when everything works fine, I'll close this issue.

SimonMolinsky commented 1 year ago

[x] flat file read,
[x] actions==None error,
[x] movie lens 100k tutorial,
[x] movie lens 25M tutorial

SimonMolinsky commented 1 year ago

@inpefess It took me longer than I expected because the preprocessing time of the 25M MovieLens dataset was a disaster with core features. I've written a parser based on the pandas package, and it processes data in a reasonable time. I've created two additional tutorials based on the MovieLens datasets. I've removed the error. Now, I'm wrapping things up. I will publish the new release with all those features. I must change descriptions slightly so that I will do it along with the paper and contributor's guide corrections. I'll let you know when everything's ready in the JOSS review thread.

inpefess commented 1 year ago

@SimonMolinsky that's amazing! Sorry for not making it clear that it was not obligatory for the JOSS review to use the 25M dataset, but only something tabular. 25M is not yet a 'big data', but certainly not a toy. I'm happy to hear you managed to scale. It will be a great plus for the project. Well done!