Added script that gets the relevance_means of skipped documents. Added script for global url distributions.

ykdojo / personalized_search_challenge

Attempt on a Kaggle competition, Personalized Web Search Challenge, hosted by Yandex (http://www.kaggle.com/c/yandex-personalized-web-search-challenge)

12 stars 4 forks source link

Added script that gets the relevance_means of skipped documents. Added script for global url distributions. #26

Closed punit-haria closed 10 years ago

punit-haria commented 10 years ago

skipped_relevance_mean.py --> gets the relevance means of documents skipped one or more times (for each position between 1 and 10)

global_url_distribution.py --> plots the count of urls repeated once, twice, ..etc (globally)

session_parser.py --> added functions that return skipped documents

Sorry about the reformatting, my editor was being really picky.

ykdojo commented 10 years ago

I'll merge this for now, but can you make the changes that I suggested and send another pull request?

For the next step, can you compare the global means and skipped means? Currently, global means include skipped means inside, so we will need to get rid of that contribution to compare them better. We can do it just by using the final sums and lengths, as in:

sum = global_sum_rank_2 - skipped_sum_rank_2 length = global_length_rank_2 - skipped_length_rank_2 mean_rank_2 = sum / length