malllabiisc / EmbedKGQA

ACL 2020: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings
Apache License 2.0
412 stars 95 forks source link

How to use ComplEx pretrain MetaQA_half? #109

Closed lihuiliullh closed 2 years ago

lihuiliullh commented 2 years ago

May I know how you pre-train MetaQA data? Do you use the code in directory "train_embeddings" to learn the embedding? If so, can you share the command of running the main.py with me? If not, how do you generate the bn0.npy, bn1.npy, bn2.npy, E.npy, R.npy?

lihuiliullh commented 2 years ago

I read the issue https://github.com/malllabiisc/EmbedKGQA/issues/41.

Do you use command = 'python3 main.py --dataset MetaQA_half --num_iterations 500 --batch_size 256 ' '--lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 ' '--hidden_dropout1 0.3 --hidden_dropout2 0.3 --label_smoothing 0.1 ' '--valid_steps 10 --model ComplEx ' '--loss_type BCE --do_batch_norm 1 --l3_reg 0.001 ' '--outfile /scratch/embeddings'

to train the model?

Do you only use train.txt in MetaQA_half to train ComplEx?

lihuiliullh commented 2 years ago

Here is the accuracy I get using main.py in train_embedding on MetaQA_half. CUDA_VISIBLE_DEVICES=3 python main.py --dataset MetaQA --num_iterations 500 --batch_size 256 \ --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 \ --hidden_dropout1 0.3 --hidden_dropout2 0.3 --label_smoothing 0.1 \ --valid_steps 10 --model ComplEx \ --loss_type BCE --do_batch_norm 0 --l3_reg 0.001

Hits @10: 0.173125 Hits @3: 0.0995 Hits @1: 0.04325 Mean rank: 9449.934 Mean reciprocal rank: 0.08621291153867376 Best valid: [0.08832985424199054, 9482.2985, 0.178375, 0.09875, 0.046625] Best Test: [0.08618774827374057, 9425.521875, 0.1725, 0.100125, 0.043375]

The accuracy is very low. Is there a problem?

apoorvumang commented 2 years ago

yes this is very low. let me try and get back to you

apoorvumang commented 2 years ago

did you try with batch norm?

lihuiliullh commented 2 years ago

According to the code, when using "--do_batch_norm 0", do_batch_norm = False. So, I guess I didn't do batch norm.

apoorvumang commented 2 years ago

what I meant was, did you try running with "do_batch_norm 1" ?

lihuiliullh commented 2 years ago

Yes, I use "do_batch_norm 1" to run the code. The hits@1 is about 0.07.

apoorvumang commented 2 years ago

I ran this

CUDA_VISIBLE_DEVICES=2 python main.py --dataset MetaQA_half --num_iterations 1000 --batch_size 256 \
                                       --lr 0.005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 \
                                       --hidden_dropout1 0.2 --hidden_dropout2 0.3 --label_smoothing 0.1 \
                                       --valid_steps 10 --model ComplEx \
                                       --loss_type BCE --do_batch_norm 1 --l3_reg 0.0

and got image

These embeddings should be ok-ish I think for downstream application. You will have to uncomment following line (and pls check code to see file location) to save the trained files.

https://github.com/malllabiisc/EmbedKGQA/blob/b2a33674a0a6653745e55a3da53cc2d7e00c372d/train_embeddings/main.py#L249