ucasir / NPRF

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Apache License 2.0
32 stars 10 forks source link

file #10

Closed wangxinzhe123 closed 2 years ago

wangxinzhe123 commented 2 years ago

Hello, can you explain in detail what these following files(res_file ,docnolist_file df_file,topic_file)are?I have disk4 and disk5 files and want to process them into corresponding usable files.Thank you very much for your reply!

relevance_params = {'res_file': os.path.join(global_info_path, 'desc.res'), 'qrels_file': os.path.join(global_info_path, 'qrels.clueweb09b.txt'), 'docnolist_file': os.path.join(global_info_path, 'docnolist'), 'output_file': os.path.join(global_info_path, 'relevance.clue.desc.fromres1000.pickle')}

global_info_path = "/home/lcj/data/desc.disk12/features/global.info" idf_params = {'df_file': os.path.join(global_info_path, 'disk12.dfcf.txt'), 'topic_file': os.path.join(global_info_path, 'disk12.desc.porter.morefilter.txt'), 'output_file': os.path.join(global_info_path, 'desc.idf.pickle'), }

canjiali commented 2 years ago

hi, below is the explanation: res_file: the retrieval results from any search toolkits like Terrier, Anserini docnolist_file: docnos that appear in the res_file, one docno per row df_file: pairs of terms with document frequency, one pair per row topic file: query files from TREC

Hope that helps.

wangxinzhe123 commented 2 years ago

Can you give me more details about df_ file?What do terms mean? Can you give me an example about this pairs?

canjiali commented 2 years ago

For every term in the colleciton, you can get its document frequency, like the code here https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexReaderUtils.java#L233