Closed Bigchen8013 closed 2 years ago
The value of not_update in the retrieve_KB function is args.do_train. The retrieve_via_frontier function is called in the retrieve_KB function, and the value of not_update is also passed. But the value of arg.do_train is 1 during training and the value that not_update needs to pass in retrieve_via_frontier is False. I can successfully calculate the loss after adding not_update = False to the front of retrieve_via_frontier. You can try
Thank you very much for your reply, I will try it first.
Sorry to bother you again.
I follow what you said, in the retrieve_KB function, add not_update = False before calling retrieve_via_frontier, and the following error occurs:
`Traceback (most recent call last):
File "G:/python_projects/ConversationalKBQA-master/mycode/ConversationKBQA_Runner.py", line 1011, in
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1200]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!`
When I delete the not_update = False, no error is reported, but all indicators are 0.Do you know why? Looking forward to your reply, thank you very much
@Bigchen8013 Add not_update=False in retrieve_via_frontier function instead of retrieve_KB function. Try it, I can do it here
@Bigchen8013 The main problem is that the subgraph candidate paths are not retrieved in retrieve_via_frontier, and the SQL_1hop_interaction method inside is not executed. You can try DeBUG.
Hi
When I train according to what you said, all the indicators are 0, as shown below,
Test: 100%|██████████| 10/10 [00:00<00:00, 10024.63it/s] 06/03/2022 10:21:54 - INFO - __main__ - ***** Eval results (99)***** 06/03/2022 10:21:54 - INFO - __main__ - dev reward=(0.0, 0.0) 06/03/2022 10:21:54 - INFO - __main__ - dev te reward=(0.0, 0.0) 06/03/2022 10:21:54 - INFO - __main__ - test reward=(0.0, 0.0) 06/03/2022 10:21:54 - INFO - __main__ - training loss=0.0 06/03/2022 10:21:54 - INFO - __main__ - training reward=(0.0, 0.0) 06/03/2022 10:21:54 - INFO - __main__ - training te loss=0.0 06/03/2022 10:21:54 - INFO - __main__ - training te reward=(0.0, 0.0) Epoch: 100%|██████████| 100/100 [03:53<00:00, 2.33s/it]
and the training ends in a few minutes, and the files in the cache are almost empty. Do you have a solution?