nyu-dl / dl4marco-bert

BSD 3-Clause "New" or "Revised" License
476 stars 87 forks source link

Testing the model with the run function #10

Closed valavakilian closed 5 years ago

valavakilian commented 5 years ago

Hi and thank you for sharing your project. I'm trying to use your pre-trained model to run inference. Currently, I'm just trying to see how to use the project so I want to see if I can run inference on the treccar files. I tried using the run files but I have some trouble understanding where I should input the treccar data and the trained model. Can you please help me a bit with how to do a simple inference? Thank you for your help

rodrigonogueira4 commented 5 years ago

I think the easiest way is to use the Colab: https://colab.sandbox.google.com/drive/1uIXKkxkEbwe2Z6-tGmbbH10ptwd2Tr0u

You should update DATA_DIR with folder containing the following files (which can be found at https://drive.google.com/open?id=16tk7HmLaqvU0oIO5L_H8elwqKn2cJUzG): dataset_dev.tf dataset_test.tf dataset_train.tf dev.qrels
dev.run dev.topics test.qrels test.run test.topics train.qrels train.run train.topics

And you should update INIT_CHECKPOINT with the path to the trained model checkpoint, which can be downloaded at https://drive.google.com/open?id=1fzcL2nzUJMUd0w4J5JIeASSrN4uHlSqP

I hope this helps.

valavakilian commented 5 years ago

Hi and thanks for your help. I can't use CoLab right now so I just have the code running on a local file. I have set USE_TPU and DO_TRAIN to False as well. I have also added a path to a json file for a uncased bert large config file ( of course I gave it the path of the folder to BERT_CONFIG_FILE ). I have two problems: have downloaded the treccar folder and I have given the DATA_DIR the correct path but I run into the following error while compiling: Joshua White [1:21 PM] tensorflow.python.framework.errors_impl.NotFoundError: /home/jwhite/valaTemp/CoLabRanker/treccar/query_doc_ids_dev.txt; No such file or directory

I don't know but I think the file should be created during runtime. Is it possible to fix this problems to run it locally. Also I am not sure how to reference the INIT_CHECKPOINT. I have downloaded it and I have reference the folder containing the three files : model.ckpt-100000.data-00000-of-00001 model.ckpt-100000.index model.ckpt-100000.meta But I'm not sure if that is correct. Sorry if the questions are basic and thanks a lot for your help.

rodrigonogueira4 commented 5 years ago

Those files are I've sent are for TREC-CAR and should be used with run_treccar.py. If you are using MS MARCO, please download the files that are described in the MS MARCO section of the README.md file.

Regarding the checkpoint, you should set it like this: INIT_CHECKPOINT="model.ckpt-100000"

Tensorflow will automatically look for the extensions ".data-00000-of-00001", ".index", and ".meta".

valavakilian commented 5 years ago

Thank you very much. I was able to run the functions successfully. I can see that the inference for treccar takes a long time so I'm currently working on inputting custom paragraphs and questions. Thank you for your help and amazing repository.

guotong1988 commented 5 years ago

Thank you