Ranked Retrievals - Githubissues

"This is the biggest new requirement. Your main program must operate in two modes: Boolean query mode, and ranked query mode. In ranked query mode, you must process a query without any Boolean operators and return the top K = 10 documents satisfying the query. Use the 'term at a time' algorithm as discussed in class:

For each term t in the query: (a) Calculate wq;t = ln (1 + N/dft) (b) For each document d in t's postings list: i. Acquire an accumulator value Ad (the design of this system is up to you). ii. Calculate wd;t = 1 + ln (tft;d). iii. Increase Ad by wd;t × wq;t.
For each non-zero Ad, divide Ad by Ld, where Ld is read from the docWeights.bin file.
Select and return the top K = 10 documents by largest Ad value. (Use a binary heap priority queue to select the largest results; do not sort the accumulators.)

Use 8-byte floating point numbers for all the calculations.

(print ranked retrieval results: Please print the title of each document returned from a ranked retrieval, as well as the final accumulator value for that document.)"

sotheanithsok / Habeas

Ranked Retrievals #58