Closed Arjunsankarlal closed 5 years ago
The "nan" values you are seeing are mostly likely because the eval loop could not load any example from dataset_dev.tf. Do your .tf files have the following features: "query_ids", "doc_ids", "label"?
Just for a sanity check, if you use the .tf files provided in the README page, do you see the same error?
I tried the evaluation part alone for the actual dataset that has been given, I did it on a MacBook Pro, After some 12 hrs it just printed one line of evaluation, which had some numbers, so for that it ran well. Later I stopped that process since it was not necessary for me. With the convert_msmarco_to_tfrecord.py file I converted my dataset also I checked the changes that I made it that file for converting my data to tf format.
In the qrels file I have these data, 100 0 1 1 101 0 2 2
And in dev_dataset_path, I have set path to a file which has data in the following format, query_id \t doc_id \t query \t paragraph_related_to_that_query
Is this format correct? or should I replace the 'paragraph_related_to_that_query' with the whole document text?
Also I have duplicated the query multiple times with different paragraph.
Hi, Arjunsankarlal Did you see that error in the code block?
/Users/arjun/Projects/Github/dl4marco-bert/run_msmarco.py:445: RuntimeWarning: invalid value encountered in true_divide
In the code, we initial example_idx
as 0. Then there is if statement.
if len(results) == FLAGS.num_eval_docs, then example_idx += 1. Else do nothing.
So if the results never equal num_eval_docs, then the example_idx will be 0. Then all_metrics /=example_idx will raise an error because 0 can't be divided.
Finally, the reason why results can't equal to num_eval_docs is that your total eval data is less than FLAGS.num_eval_docs.
So just add some eval datas and everything will be good.
All the detail can be found at line 363 and below.
There is a Readme that tell us how to create the original ms macro ranking data.
https://github.com/dfcf93/MSMARCO/blob/master/Ranking/README.md
Please read this and you will know how to create your own data.
Hey @frankabc, Thanks for reply. It worked, I added few docs and modified the params to the same count and generated along with few fake documents too.
I am trying to make predictions on my own dataset with the dowloaded pretrained model. I have changed the params with the locations of the file in my local respectively. When I try to run
run_msmacro.py
withtrain=False
and evaluating only with thedev
which I have created on my own. Indata_dir
I have the tfrecord processed files asdataset_dev.tf
,query_doc_ids_dev.txt
. The params after modifications are,So now when I run this, it completes without any error. At the end of the logs I see this,
Also going through the logs detailed, I found this in it,
I initially had the the output_dir (model_dir assignment done inside the main()) a different one, after looking at this warning I changed the path to the current one. Yet it is throwing me the same error and it completes the execution without any errors.
BERT_Large_trained_on_MSMARCO
folder has the three files that were downloaded with the pretrained model given the link in readme.mdNeed help on figuring this out. Kindly comment if any other details required.