nokaut / wsknn

Session-weighted recommendation system in Python
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

Suggestions to improve the JOSS paper #32

Closed inpefess closed 1 year ago

inpefess commented 1 year ago

Based on https://github.com/openjournals/joss-reviews/issues/5639#issuecomment-1628621998

  1. Probably, WSKNN should be in all capitals everywhere in the text
  2. Please implicitly introduce the abbreviation K-NN before using it. Also, choose one style k-NN/K-NN and stick to it.
  3. Please expand the WSKNN abbreviation in the summary.
  4. "The main difference between wsknn and the V-SKNN model from the presented repository is that the latter is a ready-to-use package." --- probably, "former", not "latter". Also, why different naming? Maybe, since W is for "weighted", V-SKNN doesn't work with session event weights, but WSKNN does?
  5. "The package is related to past research projects within a company" --- what company? If it's the same you indicate in your affiliation information for this paper, maybe it's worth mentioning. If not, please add the industry to which the company belonged.
  6. "In the closest future package will be enhanced with tensorflow version of the algorithm" --- please add a canonical citation for Tensorflow (https://www.tensorflow.org/about/bib). Please clarify, whether this work-in-progress is a personal effort of the package author or there is some community around (e.g. an internal project in a private company)
  7. I suggest reworking the sections (without adding or removing the content, but rather moving paragraphs around). For example, now "Statement of need" section talks about the input data format most of the time ("Input data format" would be a great additional section). Quite the contrary, "Related work" explains how WSKNN might be useful and why existing solutions are not enough (compare: "A Statement of need section that clearly illustrates the research purpose of the software and places it in the context of related work" from https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain).
  8. "algorithm itself is tuned and enhanced Vector Multiplication Session-Based kNN" --- what are the enhancements contributed by WSKNN?
  9. "The package has built-in evaluation metrics" --- the metrics you mention (MRR, precision/recall) are pretty standard. Why not reuse any existing implementation?
  10. "The other example of a stand-alone repository is ... with Keras implementation of Gru4Rec session-based recommender" --- how exactly this software is different from WSKNN or similar to it? What do you mean by "stand-alone" here?
  11. "The module takes into account the fact that different event actions have their specific weights" --- has the package user to set these weights or does WSKNN learn them somehow?
  12. "additional parameters:" seems to be wrongly indented.
  13. "it allows testing different scenarios in parallel" --- how exactly? What could prevent a user from testing several models in parallel, please elaborate
  14. "The users may control" --- are all the parameters listed arguments of the predict() method? If yes, please state explicitly
  15. "fit() to build a memory representation of a model" --- do you mean a similarity matrix? KNN is a well-known method outside recommender systems world, but "memory representation of a model" sounds arcane.
  16. "As a memory-based model" --- probably "As a memory-based method" (in contract to model-based methods)
  17. "the next n-products" --- probably "the next n product"
  18. "The wsknn package is a lightweight tool for modeling user-item interactions and making recommendations from sequential dataset" --- please cite popular sequential datasets
  19. Please cite any review of existing sequential recommenders in the Summary
  20. "The package’s algorithm can work in a cold-start scenario" --- please elaborate (maybe cite a paper on cold-start problem for sequential recommenders), and explain what do you mean by "cold-start scenario"
  21. "k-NN based approach may be placed in the bigger pipeline" --- please cite a case-study or a industrial-track paper discussing such combined architectures
  22. "Especially powerful is using weights..." --- please cite a paper containing experimental evidence for that claim
  23. "it depends only on the numpy, more_itertools and pyyaml" --- please explain how are you using pyyaml and add a canonical numpy citation
  24. "Items and Sessions that are storing item-sessions and session-items mappings may be updated sequentially" --- Items and Sessions and not mentioned in the API reference in the documentation. How exactly may them be updated sequentially, what does it mean?
  25. "But this is related to the every other model, except simple Markov Model" --- please cite any such model (maybe from sklearn)
  26. "there’s additional overhead related to the preparation of the input... That’s why wsknn has a built in preprocessing module" --- do you mean that preprocessing module helps to reduce the mentioned overhead?
SimonMolinsky commented 1 year ago

Hi @inpefess

I've included all your suggestions. Some phrases have been removed from the paper. The updated paper is in the JOSS thread, but I will answer all your points here:

  1. Done.
  2. Done.
  3. Done.
  4. Done. I've expanded this section to emphasize additional weighting methods the WSKNN package provides.
  5. Done.
  6. Done & Done.
  7. Probably it is okay right now... I think you had a point here. I have mixed sections. I created the new section Data Formats where I moved information about input data format and data structures parsed by the model.
  8. Done, see 4.
  9. Okay, the scoring system is related to the data structure (dictionary with the varying-length lists within it). Using metrics from an external package could require additional tweaking, and due to the simplicity of implementation, I've decided to do it within the package.
  10. I've removed "stand-alone" and described it as a Python scripts that are not a package. I've included it to show a different recommendation system that may be used for session-based data.
  11. I've updated the paper in a few places, making it explicit that the user must provide those weights.
  12. Done.
  13. Done. I have changed it to It is worth noticing that **the recommendation strategy may be altered after fitting a model**; it allows testing different weighting scenarios in parallel without additional models training.
  14. Done. I've expanded the section and mentioned the yaml with the model parameters file that may be used as a documentation or settings file.
  15. Done. I've clearly described it as a dictionary.
  16. Done.
  17. Instead of n-something, I've written: ... the sequence of products the user may be interested in.
  18. Done, but in the statement of need - RecSys 2015
  19. Done - @Latifi2020SessionawareRA
  20. Removed. This statement is not valid.
  21. Removed. It was a part of internal work in the company, not published anywhere.
  22. Removed, based on the internal experiments, not supported by rigorous experiments (we need to find out how it could behave with other than internal datasets).
  23. Done and Done. Additionally, I've added citations to every dependency.
  24. Included in the documentation and described in the paper.
  25. We have implemented it by ourselves; thus, I've removed this part.
  26. I've expanded the description of this module and its general role in the package.

Thanks!

inpefess commented 1 year ago

Well done!