file - Githubissues

wangxinzhe123 commented 2 years ago

Hello, can you explain in detail what these following files（res_file ，docnolist_file df_file，topic_file）are?I have disk4 and disk5 files and want to process them into corresponding usable files.Thank you very much for your reply！

relevance_params = {'res_file': os.path.join(global_info_path, 'desc.res'), 'qrels_file': os.path.join(global_info_path, 'qrels.clueweb09b.txt'), 'docnolist_file': os.path.join(global_info_path, 'docnolist'), 'output_file': os.path.join(global_info_path, 'relevance.clue.desc.fromres1000.pickle')}

global_info_path = "/home/lcj/data/desc.disk12/features/global.info" idf_params = {'df_file': os.path.join(global_info_path, 'disk12.dfcf.txt'), 'topic_file': os.path.join(global_info_path, 'disk12.desc.porter.morefilter.txt'), 'output_file': os.path.join(global_info_path, 'desc.idf.pickle'), }

canjiali commented 2 years ago

hi, below is the explanation: res_file: the retrieval results from any search toolkits like Terrier, Anserini docnolist_file: docnos that appear in the res_file, one docno per row df_file: pairs of terms with document frequency, one pair per row topic file: query files from TREC

Hope that helps.

wangxinzhe123 commented 2 years ago

Can you give me more details about df_ file？What do terms mean? Can you give me an example about this pairs?

canjiali commented 2 years ago

For every term in the colleciton, you can get its document frequency, like the code here https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexReaderUtils.java#L233

ucasir / NPRF

file #10