Closed inpefess closed 1 year ago
Thanks, this bug seems to be critical. I will perform some tests on the tabular datasets mentioned by you too, and when everything works fine, I'll close this issue.
@inpefess It took me longer than I expected because the preprocessing time of the 25M MovieLens dataset was a disaster with core features. I've written a parser based on the pandas
package, and it processes data in a reasonable time. I've created two additional tutorials based on the MovieLens datasets. I've removed the error. Now, I'm wrapping things up. I will publish the new release with all those features. I must change descriptions slightly so that I will do it along with the paper and contributor's guide corrections. I'll let you know when everything's ready in the JOSS review thread.
@SimonMolinsky that's amazing! Sorry for not making it clear that it was not obligatory for the JOSS review to use the 25M dataset, but only something tabular. 25M is not yet a 'big data', but certainly not a toy. I'm happy to hear you managed to scale. It will be a great plus for the project. Well done!
Several popular papers on sequential recommenders (e.g. BERT4Rec and SASRec) rely on tabular data (MovieLens, Amazon, Steam). I tried to run WSKNN on MovieLens 25M and failed to apply the
parse_files
function. The argumentallowed_actions
isNone
by default, so I didn't pass it, but then inside the function presumes it's not-None and fails. It wouldn't be unreasonable to assume that all the actions appearing in theaction_key
field are allowed by default and work with the omittedallowed_actions
dictionary gracefully. And, of course, the usage example with a popular open dataset instead of the package-specific one might make it even more user-friendly.