In this assignment, you will implement a content-based recommender as a LensKit recommender algorithm. We provide the skeleton of a program that takes user IDs on the command line and generates a list of 5 recommended movies, with scores, for each user. Your task is to implement the logic of the recommender itself. You will be assigned 5 user IDs for computing your graded output in the same manner as the first programming assignment.
There are 2 parts to this assignment, implementing two variants of a TF-IDF recommender.
Downloads and Resources
Project template
LensKit Example Project (not required, but useful for reference)
LensKit documentation
JavaDoc for included code
Assignment inputs
There are also 3 videos you will likely find useful: the LensKit introduction, the example walkthrough, and the assignment video itself.
Notation
This assignment uses a bit more mathematical notation than previous ones. Here's the notation we are using:
u⃗
The user's vector (in this assignment, the user profile vector).
i⃗
The item vector.
I(u)
The set of items rated by user u.
ut, it
User u's or item i's score for tag t
rui
User u's rating for item i.
μu
The average of user u's ratings.
Part 1: TF-IDF Recommender with Unweighted Profiles (50 points)
Start by downloading the project template. This is a Maven project; you can import it into your IDE directly (IntelliJ users can open the pom.xml file as a project; Eclipse users can import it as an ‘Existing Maven project’). At this point, you should be able to run the CBFMain class as a Java application; the whole project compiles and runs.
There are 3 things you need to implement to complete the first part of the assignment:
Compute item-tag vectors (the model)
For this task, you need to modify the model builder (TFIDFModelBuilder, your modifications go in the get() method) to compute the unit-normalized TF-IDF vector for each movie in the data set. We provide the skeleton of this; TODO comments indicate where you need to implement missing pieces. When this piece is done, the model should contain a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item.
Build user profile for each query user
The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. For Part 1, the profile should be the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). Complete this method.
Generate item scores for each user
The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. Modify this method to score each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector. Cosine similarity is defined as follows:
cos(u,i)=u⃗ ⋅i⃗ ∥u⃗ ∥2∥i⃗ ∥2=∑tutit∑tu2t−−−−−√∑ti2t−−−−−√
Upload the output of your program on your assigned inputs as a text file. Get your assigned inputs by entering your Coursera ID in the assignment input distributor. In most shells, you can redirect the output of the program into a text file:
$ /bin/sh target/bin/run-cbf U1 U2 U3 >unweighted.txt
This works on Windows as well; just run target\bin\run-cbf.bat. Redirecting your output in this fashion will just capture the program output; the logging output should still be displayed in your terminal.
Example Output for Unweighted User Profile
The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:
$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919
recommendations for user 4045:
11: 0.3596
63: 0.2612
807: 0.2363
187: 0.2059
2164: 0.1899
recommendations for user 144:
11: 0.3715
585: 0.2512
38: 0.1908
141: 0.1861
807: 0.1748
recommendations for user 3855:
1892: 0.4303
1894: 0.2958
63: 0.2226
2164: 0.2119
604: 0.1941
recommendations for user 1637:
2164: 0.2272
141: 0.2225
745: 0.2067
601: 0.1995
807: 0.1846
recommendations for user 2919:
11: 0.3659
1891: 0.3278
640: 0.1958
424: 0.1840
180: 0.1527
Part 2: Weighted User Profile (50 points)
For this part, modify your solution from Part 1 to compute weighted user profiles. In this variant, rather than just summing the vectors for all positively-rated items, compute a weighted sum of the item vectors for all items, with weights being based on the user's rating. Your solution should implement the following formula:
u⃗ =∑i∈I(u)(rui−μu)i⃗
Example Output for Weighted User Profile
The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:
$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919
recommendations for user 4045:
807: 0.1932
63: 0.1438
187: 0.0947
11: 0.0900
641: 0.0471
recommendations for user 144:
11: 0.1394
585: 0.1229
671: 0.1130
672: 0.0878
141: 0.0436
recommendations for user 3855:
1892: 0.2243
1894: 0.1465
604: 0.1258
462: 0.1050
10020: 0.0898
recommendations for user 1637:
393: 0.1976
24: 0.1900
2164: 0.1522
601: 0.1334
5503: 0.0992
recommendations for user 2919:
180: 0.1454
11: 0.1238
1891: 0.1172
424: 0.1074
2501: 0.0973
Overview
In this assignment, you will implement a content-based recommender as a LensKit recommender algorithm. We provide the skeleton of a program that takes user IDs on the command line and generates a list of 5 recommended movies, with scores, for each user. Your task is to implement the logic of the recommender itself. You will be assigned 5 user IDs for computing your graded output in the same manner as the first programming assignment.
There are 2 parts to this assignment, implementing two variants of a TF-IDF recommender.
Downloads and Resources
Project template LensKit Example Project (not required, but useful for reference) LensKit documentation JavaDoc for included code Assignment inputs There are also 3 videos you will likely find useful: the LensKit introduction, the example walkthrough, and the assignment video itself. Notation
This assignment uses a bit more mathematical notation than previous ones. Here's the notation we are using:
u⃗ The user's vector (in this assignment, the user profile vector). i⃗ The item vector. I(u) The set of items rated by user u. ut, it User u's or item i's score for tag t rui User u's rating for item i. μu The average of user u's ratings.
Part 1: TF-IDF Recommender with Unweighted Profiles (50 points)
Start by downloading the project template. This is a Maven project; you can import it into your IDE directly (IntelliJ users can open the pom.xml file as a project; Eclipse users can import it as an ‘Existing Maven project’). At this point, you should be able to run the CBFMain class as a Java application; the whole project compiles and runs.
There are 3 things you need to implement to complete the first part of the assignment:
Compute item-tag vectors (the model) For this task, you need to modify the model builder (TFIDFModelBuilder, your modifications go in the get() method) to compute the unit-normalized TF-IDF vector for each movie in the data set. We provide the skeleton of this; TODO comments indicate where you need to implement missing pieces. When this piece is done, the model should contain a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item.
Build user profile for each query user The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. For Part 1, the profile should be the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). Complete this method.
Generate item scores for each user The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. Modify this method to score each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector. Cosine similarity is defined as follows:
cos(u,i)=u⃗ ⋅i⃗ ∥u⃗ ∥2∥i⃗ ∥2=∑tutit∑tu2t−−−−−√∑ti2t−−−−−√ Upload the output of your program on your assigned inputs as a text file. Get your assigned inputs by entering your Coursera ID in the assignment input distributor. In most shells, you can redirect the output of the program into a text file:
$ /bin/sh target/bin/run-cbf U1 U2 U3 >unweighted.txt This works on Windows as well; just run target\bin\run-cbf.bat. Redirecting your output in this fashion will just capture the program output; the logging output should still be displayed in your terminal.
Example Output for Unweighted User Profile
The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:
$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919 recommendations for user 4045: 11: 0.3596 63: 0.2612 807: 0.2363 187: 0.2059 2164: 0.1899 recommendations for user 144: 11: 0.3715 585: 0.2512 38: 0.1908 141: 0.1861 807: 0.1748 recommendations for user 3855: 1892: 0.4303 1894: 0.2958 63: 0.2226 2164: 0.2119 604: 0.1941 recommendations for user 1637: 2164: 0.2272 141: 0.2225 745: 0.2067 601: 0.1995 807: 0.1846 recommendations for user 2919: 11: 0.3659 1891: 0.3278 640: 0.1958 424: 0.1840 180: 0.1527
Part 2: Weighted User Profile (50 points)
For this part, modify your solution from Part 1 to compute weighted user profiles. In this variant, rather than just summing the vectors for all positively-rated items, compute a weighted sum of the item vectors for all items, with weights being based on the user's rating. Your solution should implement the following formula:
u⃗ =∑i∈I(u)(rui−μu)i⃗ Example Output for Weighted User Profile
The following example gives actual outputs for 5 user IDs in the data set. Use it to verify both your output format and your computation:
$ /bin/sh target/bin/run-cbf 4045 144 3855 1637 2919 recommendations for user 4045: 807: 0.1932 63: 0.1438 187: 0.0947 11: 0.0900 641: 0.0471 recommendations for user 144: 11: 0.1394 585: 0.1229 671: 0.1130 672: 0.0878 141: 0.0436 recommendations for user 3855: 1892: 0.2243 1894: 0.1465 604: 0.1258 462: 0.1050 10020: 0.0898 recommendations for user 1637: 393: 0.1976 24: 0.1900 2164: 0.1522 601: 0.1334 5503: 0.0992 recommendations for user 2919: 180: 0.1454 11: 0.1238 1891: 0.1172 424: 0.1074 2501: 0.0973
Your inputs for Programming Assignment 2 are: