practical-recommender-systems / moviegeek

A django website used in the book Practical Recommender Systems to illustrate how recommender algorithms can be implemented.
MIT License
903 stars 362 forks source link

fix similar users by using a string type for user_id #18

Closed walterbm closed 5 years ago

walterbm commented 5 years ago

While working through the user similarities section of chapter 7 I had a lot of trouble getting similarly data to show up in the analytics view.

From what I can tell the underlying issue was a type mismatch when calculating user similarities. Both the jaccard and pearson similarity methods first check whether the base user and target user are in the dataset:

https://github.com/practical-recommender-systems/moviegeek/blob/ef5172476a357230cbd626f0f9df4a495d1c702c/recommender/views.py#L91-L92

But this check was always returning False for me. I think this may be due to the fact that the dataset is a dictionary populated by the Rating model and in the Rating model's schema user_id is a char field:

https://github.com/practical-recommender-systems/moviegeek/blob/ef5172476a357230cbd626f0f9df4a495d1c702c/analytics/models.py#L4-L5

So when building the user dataset the dictionary keys are strings, not integers, and the member check in all the similarity functions doesn't work as expected when integer user ids are passed in.

I checked with Python 3.7 to replicate the issue:

Python 3.7.0
>>> users = {"222": "test"}
>>> 222 in users
False
>>> "222" in users
True

This PR just removes the int() type-casting around the user ids and with that change the similarities seem to work.

Please let me know if this makes sense or if it seems totally off-base. I'm not sure if the unexpected behavior could be a result of my version of Python/dependencies.

kimfalk commented 5 years ago

Hi,

Thank you for the elaborate description. I tested it and I agree with you, those casts should not be there.

walterbm commented 5 years ago

Glad I could help. Really enjoying the book, thank you!