Closed todpole3 closed 6 years ago
Hi Victoria, Thanks for trying out our code. Can you kindly point me to the evaluation script you used for evaluation. Unlike DeepPath, we train a single model for all the relations and hence we use a single graph. However to keep evaluation correct, we remove the edges corresponding to the query triple. For example, for the query triple John_Doe --works_for--> Google When MINERVA starts to walk from the node John_Doe, it is not allowed to take the edge works_for to reach Google. (ref: https://github.com/shehzaadzd/MINERVA/blob/master/code/data/grapher.py#L56)
Hi Shehzaad,
I generated the results by running scripts in the repo, so yes, I think the grapher should be the same as yours. It uses the graph named "graph.txt" in the data_preprocessed
folder, which contains 304,434 facts. I noticed that this is the number of edges in the full graph (154,213x2) excluding the test edges (3,992).
I did the evaluation by extracting the prediction scores of MINERVA written in the test_beam
folder, and then passed to the same evaluation script used by DeepPath.
Here is my evaluation script:
https://gist.github.com/todpole3/51e1704b5efd85cb67b9ee0c95e0b028
@shehzaadzd Any updates on this one?
Hi Victoria, Using the script you shared and the answers I got using our model I was able to replicate our scores. I have modified your script to use my logs. I'm sharing my answers (link) with you to check. I was not able to reproduce the results you got as I didn't have the pickle files that you used. I hope this helps.
Thanks very much. I will check the difference and get back to you.
@shehzaadzd @todpole3, I encounter the same problem. What is the test_prediction_path data format? I think there seems to be a script to parse the results in test_beam? Thanks a lot!
Hi Po sen, I've uploaded the answers generated by our model on each nell task. (link). I'm adding the code to print these answers in our main repo and I will push it soon. Hope this helps. Sorry for the delayed response.
Thanks a lot! Looking forward to the script.
One more thing related to the setup. If I want to train on one relation only as in DeepPath (instead of jointly), should I only use one of the relations in train.txt?
such as grep concept:worksfor train.txt > train_worksfor.txt
?
Thanks!
@shehzaadzd, any update for the script?
@posenhuang - We apologize for the late reply from our side. To train on one relation; you can train on the individual graphs (exactly as in the DeepPath). For example, the data for concept:worksfor would be in here. Also the script is the one which @shehzaadzd had linked couple of answers before (link). You can also use the same script here. Also wanted to check with @todpole3 -- are you still facing any issues?
Thanks, I haven't been experimenting with this dataset lately. Will check and let you know.
Thanks very much for the nice code! I reproduce the experiment on the dataset NELL, However, when i generate separate models on each nell tasks using the default config files.(using_entity_embedding=1 for all task) I can produce almost all result of task. But if i run a whole model for all task, the result is also significantly lower compared to the reported results., the result is below: athleteplaysinleague: MINERVA MAP: 0.7619328833895763 (381 queries evaluated) worksfor: MINERVA MAP: 0.7451084742652438 (421 queries evaluated) organizationhiredperson: MINERVA MAP: 0.8572348007748 (349 queries evaluated) athleteplayssport: MINERVA MAP: 0.9160862354892205 (603 queries evaluated) teamplayssport: MINERVA MAP: 0.7702593537414967 (112 queries evaluated) personborninlocation: MINERVA MAP: 0.7865804604146573 (193 queries evaluated) athletehomestadium: MINERVA MAP: 0.5223731630448047 (201 queries evaluated) organizationheadquarteredincity: MINERVA MAP: 0.9133535585342815 (249 queries evaluated) athleteplaysforteam: MINERVA MAP: 0.6284948040761995 (387 queries evaluated)
the config file i used is below:
data_input_dir="datasets/data_preprocessed/nell/" .
vocab_dir="datasets/data_preprocessed/nell/vocab"
total_iterations=3000
path_length=3 //according to the appendix
hidden_size=100 . / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
embedding_size=100 //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
batch_size=128 //default value
beta=0.05 //according to the appendix
Lambda=0.02 //according to the appendix
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell/worksfor"
load_model=0
model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
18 nell_evaluation=1
Even the answer you provided seems correct, I still want to make sure I did set the hyperparameters to the correct value for your single model for all the relations?? Thanks a lot!!!
Hi Lee, Thanks for trying out our code! Can you train the single model on the data in datasets/data_preprocessed/nell-995 The datasets/data_preprocessed/nell data was changed a bit for the purpose of link prediction. It has a fewer number of examples in the train. We changed this to create a proper validation set. Tell me if you still have issues reproducing the results. -S
Thanks for telling me this, I will run the model on nell-995 dataset and check the result ! !
Hi shehzaadzd I have run the experiment on the dataset nell-995 with the config file above, this is my result
athleteplaysinleague:
MINERVA MAP: 0.7824126150897805 (381 queries evaluated)
worksfor:
MINERVA MAP: 0.7689410483947302 (421 queries evaluated)**(0.825)**
organizationhiredperson:
MINERVA MAP: 0.8717628574212938 (349 queries evaluated) (0.851)
athleteplayssport:
MINERVA MAP: 0.9177169707020453 (603 queries evaluated) (0.985)
teamplayssport:
MINERVA MAP: 0.6906675170068028 (112 queries evaluated)**(0.846)**
personborninlocation:
MINERVA MAP: 0.7665333946422028 (193 queries evaluated)**(0.793)**
athletehomestadium:
MINERVA MAP: 0.5319267658819898 (201 queries evaluated)**(0.895)**
organizationheadquarteredincity:
MINERVA MAP: 0.9453257474341812 (249 queries evaluated) (0.946)
athleteplaysforteam:
MINERVA MAP: 0.6555836139169473 (387 queries evaluated) **(0.824)**
config file:
LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" .
vocab_dir="datasets/data_preprocessed/nell/vocab"
total_iterations=3000
path_length=3 //according to the appendix
hidden_size=100 . / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
embedding_size=100 //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
batch_size=128 //default value
beta=0.05 //according to the appendix
Lambda=0.02 //according to the appendix
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell/worksfor"
load_model=0
model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
18 nell_evaluation=1
I also try to set the embedding size and hidden size to 50 ,the result is below
athleteplaysinleague:
MINERVA MAP: 0.7700987187207659 (381 queries evaluated)
worksfor:
MINERVA MAP: 0.7844730816821078 (421 queries evaluated)
organizationhiredperson:
MINERVA MAP: 0.8710068284974013 (349 queries evaluated)
athleteplayssport:
MINERVA MAP: 0.9182974018794915 (603 queries evaluated)
teamplayssport:
MINERVA MAP: 0.7468537414965987 (112 queries evaluated)
personborninlocation:
MINERVA MAP: 0.7555456489394312 (193 queries evaluated)
athletehomestadium:
MINERVA MAP: 0.5220393343527672 (201 queries evaluated)
organizationheadquarteredincity:
MINERVA MAP: 0.915163829922866 (249 queries evaluated)
athleteplaysforteam:
MINERVA MAP: 0.6305270311084265 (387 queries evaluated)
config file:
LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" .
vocab_dir="datasets/data_preprocessed/nell/vocab"
total_iterations=3000
path_length=3 //according to the appendix
hidden_size=50 . / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
embedding_size=50 //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
batch_size=128 //default value
beta=0.05 //according to the appendix
Lambda=0.02 //according to the appendix
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell/worksfor"
load_model=0
model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
18 nell_evaluation=1
finally , i set the LSTM Layers to 3 according to your paper, the result
athleteplaysinleague:
MINERVA MAP: 0.7820689080531601 (381 queries evaluated)
worksfor:
MINERVA MAP: 0.7692186541355187 (421 queries evaluated)
organizationhiredperson:
MINERVA MAP: 0.865689742796194 (349 queries evaluated)
athleteplayssport:
MINERVA MAP: 0.9081260364842456 (603 queries evaluated)
teamplayssport:
MINERVA MAP: 0.6653698979591837 (112 queries evaluated)
personborninlocation:
MINERVA MAP: 0.7679808821000531 (193 queries evaluated)
athletehomestadium:
MINERVA MAP: 0.5379048121585435 (201 queries evaluated)
organizationheadquarteredincity:
MINERVA MAP: 0.9487218716134379 (249 queries evaluated)
athleteplaysforteam:
MINERVA MAP: 0.6487594662013266 (387 queries evaluated)
However, none of the results are similar to the result in paper, I think i set the hyperparameters completely according to the paper or the appendix. is my config file the optimal parameter for your experiment. Could you help me to reproduce the results? Thanks a lot !!!!!!!!
Thanks very much for releasing the code in accompany with the paper. It definitely makes reproducing the experiments a lot easier. I've been playing with the code base and have some questions on reproducing the NELL-995 experiments.
The codebase does not contain the configuration file for NELL-995 experiments, nor does it contains the evaluation scripts for computing MAP. (Maybe you've missed them from the release?) I used the hyperparameters reported in "Experimental Details, section 2.3" and the appendix section 8.1 of the paper, which results in the following configuration file:
I run train & test as specified in the README, and evaluate the decoding results using the MAP computation script produced by the DeepPath paper. (I assumed that the experiment setup is exactly the same as the DeepPath paper since you compared head-to-head with them.)
However, the MAP results I obtained this way is significantly lower compared to the reported results.
I did a few variation on embedding dimensions and also tried to freeze entity embeddings, yet none of the trials produced numbers close to the results tabulated in the MINERVA paper.
Would you please clarify the experiment setup for computing MAP? I want to make sure I did set the hyperparameters to the correct value. Besides, the DeepPath paper used a relation-dependent underlying graph per relation during inference. Did you also vary the graph per relation or used a base graph for all relations like you did for other datasets?
Many thanks.