malllabiisc / EmbedKGQA

ACL 2020: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings
Apache License 2.0
414 stars 96 forks source link

Can you provide your websqp kg embedding file? #39

Open ToneLi opened 3 years ago

ToneLi commented 3 years ago

I found your websqp kg embedding file is None, can you supply it? When I run your training embedding code with your KG (relevant your websqp ) . I found there is an error, CUDA out of memory Can you suply the relevent embedding file, just like metaqa??

ShuangNYU commented 3 years ago

I found your websqp kg embedding file is None, can you supply it? When I run your training embedding code with your KG (relevant your websqp ) . I found there is an error, CUDA out of memory Can you suply the relevent embedding file, just like metaqa??

I encountered the same error about out of memory. Have you fixed this problem?

apoorvumang commented 3 years ago

Can you tell the exact command you executed, along with your GPU configuration? @ShuangNYU @ToneLi

ShuangNYU commented 3 years ago

I use the command as below. command = 'python3 main.py --dataset fbwq_full --num_iterations 1500 --batch_size 256 ' \ '--lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 ' \ '--hidden_dropout1 0.3 --hidden_dropout2 0.3 --label_smoothing 0.1 ' \ '--valid_steps 10 --model ComplEx ' \ '--loss_type BCE --do_batch_norm 1 --l3_reg 0.001 ' \ '--outfile /scratch/ComplEx_fbwq_half'

The Error is Number of training data points: 11560492 Entities: 1886683 Relations: 1144 Model is ComplEx Starting training... Traceback (most recent call last): File "main.py", line 327, in experiment.train_and_eval() File "main.py", line 230, in train_and_eval loss.backward() File "/home/sg5963/.local/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/sg5963/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA out of memory. Tried to allocate 1.41 GiB (GPU 0; 11.17 GiB total capacity; 9.62 GiB already allocated; 823.31 MiB free; 429.74 MiB cached)

Interestingly, I am training with TuckER now and has been on running for 5 hours...

ToneLi commented 3 years ago

I did the experiment in fbwq_half, I also have the same problem with ShuangNYU, CUDA out of memory, I set the batch_size is 64. While when I use TuckER, That's gone.

apoorvumang commented 3 years ago

Please set the batch size to 32 so that out of memory doesn't happen. Also, I'm not sure how training with TuckER will work out since the pre-trained embeddings are for ComplEx, not TuckER.

ToneLi commented 3 years ago

Can you tell me what's your gpu server, I trained half KG about WebQSP, it consumes my 24059M gpu space and one day but cannot get the result. On the other hand, I want to know does ShuangNYU get the result?

apoorvumang commented 3 years ago

@ToneLi We trained on single 1080Ti with 12G memory. Will let you know the exact command and exact output tomorrow.

ToneLi commented 3 years ago

Thanks!!

ShuangNYU commented 3 years ago

Can you tell me what's your gpu server, I trained half KG about WebQSP, it consumes my 24059M gpu space and one day but cannot get the result. On the other hand, I want to know does ShuangNYU get the result?

I didn't get the results yet, either. I ran it on cluster of our school. RuntimeError: CUDA out of memory. Tried to allocate 1.80 GiB (GPU 0; 15.90 GiB total capacity; 11.19 GiB already allocated; 1.80 GiB free; 2.14 GiB cached)

ToneLi commented 3 years ago

I rewrite the source code of TuckER, add the Complex in it, this code can run. I test it in half KG about WebQSP, my epoch is 6, the result is Hits @10: 0.4956 Hits @3: 0.3396 Hits @1: 0.21595 Mean rank: 99493.86215. I will test the more epoch in the next. If need, you can contact me.

ToneLi commented 3 years ago

@apoorvumang, I want to know the result about your knowledge embedding in webqsp, because my code has run five days, my epoch is 400. until now, the results is not good.