Fixing an issue reported several times, but not fixed entirely in issues #5 and #6 (and mentioned in issue #9 as a remaining bug).
The faulty logic mentioned in issue #5 was valid, but an error remained which led to the same items/movies recommended multiple times. This was due to the fact that the operations in line 80 resulted a non-unique RDD, meaning that the same movies are present multiple times. This is solved by adding the .distinct() operation, which removes duplicate entires.
Step-by-step:
self.ratings_RDD contains all user ratings
.filter(lambda rating: not rating[0] == user_id) eliminates all movies already rated by specified user, where rating[0] refers to the user_id column
.map(lambda x: (user_id, x[1])) puts all movie_ids in a (user_id, movie_id) format in a table (This is where a movie can exist multiple times!)
Fixing an issue reported several times, but not fixed entirely in issues #5 and #6 (and mentioned in issue #9 as a remaining bug).
The faulty logic mentioned in issue #5 was valid, but an error remained which led to the same items/movies recommended multiple times. This was due to the fact that the operations in line 80 resulted a non-unique RDD, meaning that the same movies are present multiple times. This is solved by adding the .distinct() operation, which removes duplicate entires.
Step-by-step: