yaravind / recommender-systems

Completed exercises for Coursera Recommender Systems MOOC
https://www.coursera.org/specializations/recommender-systems
10 stars 9 forks source link

Programming Assignment 2 - Content Based Recommendations #2

Closed yaravind closed 11 years ago

yaravind commented 11 years ago

Overview

In this assignment, you will implement a content-based recommender as a LensKit recommender algorithm. We provide the skeleton of a program that takes user IDs on the command line and generates a list of 5 recommended movies, with scores, for each user. Your task is to implement the logic of the recommender itself. You will be assigned 5 user IDs for computing your graded output in the same manner as the first programming assignment.

There are 2 parts to this assignment, implementing two variants of a TF-IDF recommender.

Downloads and Resources

Project template LensKit Example Project (not required, but useful for reference) LensKit documentation JavaDoc for included code Assignment inputs There are also 3 videos you will likely find useful: the LensKit introduction, the example walkthrough, and the assignment video itself. Notation

This assignment uses a bit more mathematical notation than previous ones. Here's the notation we are using:

u⃗ The user's vector (in this assignment, the user profile vector). i⃗ The item vector. I(u) The set of items rated by user u. ut, it User u's or item i's score for tag t rui User u's rating for item i. μu The average of user u's ratings.

Part 1: TF-IDF Recommender with Unweighted Profiles (50 points)

Start by downloading the project template. This is a Maven project; you can import it into your IDE directly (IntelliJ users can open the pom.xml file as a project; Eclipse users can import it as an ‘Existing Maven project’). At this point, you should be able to run the CBFMain class as a Java application; the whole project compiles and runs.

There are 3 things you need to implement to complete the first part of the assignment:

Compute item-tag vectors (the model) For this task, you need to modify the model builder (TFIDFModelBuilder, your modifications go in the get() method) to compute the unit-normalized TF-IDF vector for each movie in the data set. We provide the skeleton of this; TODO comments indicate where you need to implement missing pieces. When this piece is done, the model should contain a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item.

Build user profile for each query user The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. For Part 1, the profile should be the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). Complete this method.

Generate item scores for each user The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. Modify this method to score each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector. Cosine similarity is defined as follows:

cos(u,i)=u⃗ ⋅i⃗ ∥u⃗ ∥2∥i⃗ ∥2=∑tutit∑tu2t−−−−−√∑ti2t−−−−−√ Upload the output of your program on your assigned inputs as a text file. Get your assigned inputs by entering your Coursera ID in the assignment input distributor. In most shells, you can redirect the output of the program into a text file:

$ /bin/sh target/bin/run-cbf U1 U2 U3 >unweighted.txt This works on Windows as well; just run target\bin\run-cbf.bat. Redirecting your output in this fashion will just capture the program output; the logging output should still be displayed in your terminal.

Example Output for Unweighted User Profile

The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:

$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919 recommendations for user 4045: 11: 0.3596 63: 0.2612 807: 0.2363 187: 0.2059 2164: 0.1899 recommendations for user 144: 11: 0.3715 585: 0.2512 38: 0.1908 141: 0.1861 807: 0.1748 recommendations for user 3855: 1892: 0.4303 1894: 0.2958 63: 0.2226 2164: 0.2119 604: 0.1941 recommendations for user 1637: 2164: 0.2272 141: 0.2225 745: 0.2067 601: 0.1995 807: 0.1846 recommendations for user 2919: 11: 0.3659 1891: 0.3278 640: 0.1958 424: 0.1840 180: 0.1527

Part 2: Weighted User Profile (50 points)

For this part, modify your solution from Part 1 to compute weighted user profiles. In this variant, rather than just summing the vectors for all positively-rated items, compute a weighted sum of the item vectors for all items, with weights being based on the user's rating. Your solution should implement the following formula:

u⃗ =∑i∈I(u)(rui−μu)i⃗ Example Output for Weighted User Profile

The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:

$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919 recommendations for user 4045: 807: 0.1932 63: 0.1438 187: 0.0947 11: 0.0900 641: 0.0471 recommendations for user 144: 11: 0.1394 585: 0.1229 671: 0.1130 672: 0.0878 141: 0.0436 recommendations for user 3855: 1892: 0.2243 1894: 0.1465 604: 0.1258 462: 0.1050 10020: 0.0898 recommendations for user 1637: 393: 0.1976 24: 0.1900 2164: 0.1522 601: 0.1334 5503: 0.0992 recommendations for user 2919: 180: 0.1454 11: 0.1238 1891: 0.1172 424: 0.1074 2501: 0.0973

Your inputs for Programming Assignment 2 are: