Closed wangxinzhe123 closed 2 years ago
hi, below is the explanation: res_file: the retrieval results from any search toolkits like Terrier, Anserini docnolist_file: docnos that appear in the res_file, one docno per row df_file: pairs of terms with document frequency, one pair per row topic file: query files from TREC
Hope that helps.
Can you give me more details about df_ file?What do terms mean? Can you give me an example about this pairs?
For every term in the colleciton, you can get its document frequency, like the code here https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexReaderUtils.java#L233
Hello, can you explain in detail what these following files(res_file ,docnolist_file df_file,topic_file)are?I have disk4 and disk5 files and want to process them into corresponding usable files.Thank you very much for your reply!
relevance_params = {'res_file': os.path.join(global_info_path, 'desc.res'), 'qrels_file': os.path.join(global_info_path, 'qrels.clueweb09b.txt'), 'docnolist_file': os.path.join(global_info_path, 'docnolist'), 'output_file': os.path.join(global_info_path, 'relevance.clue.desc.fromres1000.pickle')}
global_info_path = "/home/lcj/data/desc.disk12/features/global.info" idf_params = {'df_file': os.path.join(global_info_path, 'disk12.dfcf.txt'), 'topic_file': os.path.join(global_info_path, 'disk12.desc.porter.morefilter.txt'), 'output_file': os.path.join(global_info_path, 'desc.idf.pickle'), }