Closed giangnguyen2412 closed 5 years ago
The "parent_path" is the parent path for your data but not the home directory. You can ignore it anyway as long as your data path configuration is right. Did you prepare the data for training? If not, please find the related functions in the "utils" folder and generate the data first.
relevance_params = { 'res_file': '', 'qrels_file': '', 'docnolist_file': '', 'output_file': '' } create_relevance(**relevance_params)
topk_params = { 'df_file': '', # the data format of each line is: term \t df \t cf (cf is not used) 'corpus_file': '', # the data format of each line is: docno \t doclen \t term1 term2 ... 'output_file': '', 'nb_docs': , 'topk': , } topk_term(**topk_params)
doc_idf_params = { 'relevance_file': '' , 'df_file': '' , 'document_file': '' , # the output file from above function 'output_file': 'topk.idf.pkl', 'rerank_topk': 60, 'doc_topk_term': 30, 'nb_doc': 25205179 } parse_idf_for_document(**doc_idf_params)
kernel_mu_list = kernal_mus(11, True) kernel_sigma_list = kernel_sigmas(11, 0.5, True) sim_params = { 'relevance_file': '', 'topic_file': '', 'corpus_file': '', 'topk_corpus_file': '', 'embedding_file': '', 'stop_file': '', # not used actually 'sim_output_path': '', 'kernel_output_path': '', 'kernel_mu_list': kernel_mu_list, 'kernel_sigma_list': kernel_sigma_list, 'topk_supervised': 40, 'd2d': True, 'test': False }
hist_params = { 'relevance_file': '', 'text_max_len': , 'hist_size': , 'sim_path': , 'hist_path': , 'd2d': True }
sim_mat_and_kernel_d2d(sim_params) hist_d2d(hist_params)
Excuse me! Could you please specify in details how to prepare data for training. As you can see in README.MD, you just mention about utils.py file and some functions. I think I and others can not understand and follow to complete running training your model.
I think your instructions will really help.
Thank you.
You meant data preparation like this? https://github.com/NTMC-Community/MatchZoo
Two quick questions : Do you have TREC robust04 or disk12 data? Have you retrieve a result file for those queries and download the qrels files from TREC website?
1) No, I dont have, so I need to download them? 2) No, I did not. Its my first time running an IR model.
Sorry for if silly questions.
Please be familiar with IR first, e.g. index, retrieval, evaluation. Running some traditional (non-neural) IR experiments will also benefit you. This repo is not for anyone to learn IR from scratch.
Ok I will try running again and ask you later. Thanks for your help
Hi again,
I am running the command for training: python nprf_drmm.py --fold 5 1 Then I am supposing to modify the config file: model/nprf_drmm_config.py, but how can I config this file. I modify the variable parent_path to parent_path = '/home/dexter/NPRF/model' (home directory), but it doesn't work.
Could you please help me out.
Thanks.