While working through the user similarities section of chapter 7 I had a lot of trouble getting similarly data to show up in the analytics view.
From what I can tell the underlying issue was a type mismatch when calculating user similarities. Both the jaccard and pearson similarity methods first check whether the base user and target user are in the dataset:
But this check was always returning False for me. I think this may be due to the fact that the dataset is a dictionary populated by the Rating model and in the Rating model's schema user_id is a char field:
So when building the user dataset the dictionary keys are strings, not integers, and the member check in all the similarity functions doesn't work as expected when integer user ids are passed in.
I checked with Python 3.7 to replicate the issue:
Python 3.7.0
>>> users = {"222": "test"}
>>> 222 in users
False
>>> "222" in users
True
This PR just removes the int() type-casting around the user ids and with that change the similarities seem to work.
Please let me know if this makes sense or if it seems totally off-base. I'm not sure if the unexpected behavior could be a result of my version of Python/dependencies.
While working through the user similarities section of chapter 7 I had a lot of trouble getting similarly data to show up in the analytics view.
From what I can tell the underlying issue was a type mismatch when calculating user similarities. Both the
jaccard
andpearson
similarity methods first check whether the base user and target user are in the dataset:https://github.com/practical-recommender-systems/moviegeek/blob/ef5172476a357230cbd626f0f9df4a495d1c702c/recommender/views.py#L91-L92
But this check was always returning
False
for me. I think this may be due to the fact that the dataset is a dictionary populated by theRating
model and in theRating
model's schemauser_id
is a char field:https://github.com/practical-recommender-systems/moviegeek/blob/ef5172476a357230cbd626f0f9df4a495d1c702c/analytics/models.py#L4-L5
So when building the
user
dataset the dictionary keys are strings, not integers, and the member check in all the similarity functions doesn't work as expected when integer user ids are passed in.I checked with Python 3.7 to replicate the issue:
This PR just removes the
int()
type-casting around the user ids and with that change the similarities seem to work.Please let me know if this makes sense or if it seems totally off-base. I'm not sure if the unexpected behavior could be a result of my version of Python/dependencies.