ykdojo / personalized_search_challenge

Attempt on a Kaggle competition, Personalized Web Search Challenge, hosted by Yandex (http://www.kaggle.com/c/yandex-personalized-web-search-challenge)
12 stars 4 forks source link

Attempt on a Kaggle competition, Personalized Web Search Challenge (hosted by Yandex)

URL: http://www.kaggle.com/c/yandex-personalized-web-search-challenge

Deadline: Friday, January 10, 2014

Team Members

Ideas on our team name

File structure

About branches / pull requests

All the code must be reviewed by at least by one other person before being pulled into the master. Make a branch, write code, test, and send a pull request. Use short, descriptive names for branches.

Never directly work on the master.

Tools

Notes on possible strategies (more on the wiki)

Two ways to look at this problem:

  1. Collaborative filtering (recommender) problem
  2. We can also look at the past clicks a certain user has performed.
    • The user is probably more (or less) likely to click the pages they already clicked and liked. => Need to test this.

Our first strategy is based on 2. (Low-hanging fruits! Yay!)

Here is the paper I got inspiration from for this strategy: http://people.csail.mit.edu/teevan/work/publications/papers/wsdm11-pnav.pdf

Some notes on the data

The train file is big (16GB when uncompressed)

We need to think about how to handle this. Perhaps use a database, like sqlite or MySQL? I (Yosuke) suspect we can try our first strategies with a randomly-sampled subset of the data. How would we go about it?

Train and test

In the competition, the first 27 days are used as train data, and the last 3 days as test data. (http://www.kaggle.com/c/yandex-personalized-web-search-challenge/data)

Perhaps we can locally test our model using the first 24 days train and the next 3 days as test.