When I tried to start training, I got an error：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

knazeri / edge-connect

EdgeConnect: Structure Guided Image Inpainting using Edge Prediction, ICCV 2019 https://arxiv.org/abs/1901.00212

http://openaccess.thecvf.com/content_ICCVW_2019/html/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.html

Other

2.5k stars 528 forks source link

When I tried to start training, I got an error：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #188

Open TT-mouse opened 1 year ago

TT-mouse commented 1 year ago

Thank you very much for your contribution. Your article helps me a lot. As mentioned in the title, I encountered an error at the beginning of my training.The detailed error information is as follows: Traceback (most recent call last): File "E:/our code/edge-connect-master/train.py", line 2, in main(mode=1) File "E:\our code\edge-connect-master\main.py", line 56, in main model.train() File "E:\our code\edge-connect-master\src\edge_connect.py", line 178, in train self.inpaint_model.backward(i_gen_loss, i_dis_loss) File "E:\our code\edge-connect-master\src\models.py", line 259, in backward gen_loss.backward() File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch\autograd__init__.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Have you ever encountered this mistake in training? I would appreciate it if you could tell me how to solve this problem!

talyeho commented 1 year ago

Hello, the following solution solved the problem: https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error

TT-mouse commented 1 year ago

Hello, thank you for your help. I tried to modify the code according to the link you provided, but the same error still occurs.I've seen people say it's a pytorch version issue, but I want to fix that by not demoting the pytorch version.Can you help me?

talyeho commented 1 year ago

Did you move the steps to be after the backward?

TT-mouse commented 1 year ago

Did you move the steps to be after the backward?

Yes,I changed the code in the backpropagation as follows, but it didn't work.

talyeho commented 1 year ago

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Ghost0405 commented 1 year ago

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

TT-mouse commented 1 year ago

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

TT-mouse commented 1 year ago

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

In which mode did you not retrain the model well?

wizaaaard commented 1 year ago

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

TT-mouse commented 1 year ago

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

Ghost0405 commented 1 year ago

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
大家可以这样试试，虽然我是这样跑上去的，但是结果并不理想，不知道是什么原因，期待我们后续的交流。
您在哪种模式下没有很好地重新训练模型？
def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.
In which mode did you not retrain the model well?
def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.
In which mode did you not retrain the model well?

Sorry to see your reply now，My problem occurs in model = 1，Although, 540,000 rounds of training have been performed, there are still many ambiguities.

Ghost0405 commented 1 year ago

您好，对于延误，我深表歉意。这对我来说是个问题；我相信您还应该在使用之前检查 dis_loss 和 gen_loss 是否不是 None 。

感谢您的帮助，我已经解决了问题。

你能告诉我你是如何解决这个问题的吗？非常感谢！

您好，双相网络的backward需要修改。我之前只改变了第二阶段网络的落后。这可能对你有帮助！

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

manlupanshan commented 1 year ago

hello！I change the models.py as follow：

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.

CindyzhangKexin commented 1 year ago

hello！I change the models.py as follow：
def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()
And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it. Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

manlupanshan commented 1 year ago

hello！I change the models.py as follow：
def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()
And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it. Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

Hardly to say why this problem is happening. I used Stage 1 and Stage 3 again. And this network could generate not bad results. The backward() should be before optimizer.step(). So I change backward(self, gen_loss=None, dis_loss=None) like this. You may need to rethink why it does not work. Or read the following solution solved the problem: https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error It may help you, bye.