RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

CyrilShch commented 4 years ago

Model configurations:

MODE: 1 # 1: train, 2: test, 3: eval MODEL: 1 # 1: edge model, 2: inpaint model, 3: edge-inpaint model, 4: joint model MASK: 3 # 1: random block, 2: half, 3: external, 4: (external, random block), 5: (external, random block, half) EDGE: 1 # 1: canny, 2: external NMS: 1 # 0: no non-max-suppression, 1: applies non-max-suppression on the external edges by multiplying by Canny SEED: 10 # random seed GPU: [0] # list of gpu ids DEBUG: 0 # turns on debugging mode VERBOSE: 0 # turns on verbose mode in the output console

TRAIN_FLIST: ./datasets/places2_train.flist VAL_FLIST: ./datasets/places2_val.flist TEST_FLIST: ./datasets/places2_test.flist

TRAIN_EDGE_FLIST: ./datasets/places2_edges_train.flist VAL_EDGE_FLIST: ./datasets/places2_edges_val.flist TEST_EDGE_FLIST: ./datasets/places2_edges_test.flist

TRAIN_MASK_FLIST: ./datasets/masks_train.flist VAL_MASK_FLIST: ./datasets/masks_val.flist TEST_MASK_FLIST: ./datasets/masks_test.flist

LR: 0.0001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.0 # adam optimizer beta1 BETA2: 0.9 # adam optimizer beta2 BATCH_SIZE: 8 # input batch size for training INPUT_SIZE: 256 # input image size for training 0 for original size SIGMA: 2 # standard deviation of the Gaussian filter used in Canny edge detector (0: random, -1: no edge) MAX_ITERS: 2e6 # maximum number of iterations to train the model

EDGE_THRESHOLD: 0.5 # edge detection threshold L1_LOSS_WEIGHT: 1 # l1 loss weight FM_LOSS_WEIGHT: 10 # feature-matching loss weight STYLE_LOSS_WEIGHT: 250 # style loss weight CONTENT_LOSS_WEIGHT: 0.1 # perceptual loss weight INPAINT_ADV_LOSS_WEIGHT: 0.1 # adversarial loss weight

GAN_LOSS: nsgan # nsgan | lsgan | hinge GAN_POOL_SIZE: 0 # fake images pool size

SAVE_INTERVAL: 1000 # how many iterations to wait before saving model (0: never) SAMPLE_INTERVAL: 1000 # how many iterations to wait before sampling (0: never) SAMPLE_SIZE: 12 # number of images to sample EVAL_INTERVAL: 0 # how many iterations to wait before model evaluation (0: never) LOG_INTERVAL: 10 # how many iterations to wait before logging training status (0: never)

start training...

Training epoch: 1 /pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. /pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. /pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. /pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. Traceback (most recent call last): File "/content/edge-connect/train.py", line 2, in main(mode=1) File "/content/edge-connect/main.py", line 56, in main model.train() File "/content/edge-connect/src/edge_connect.py", line 115, in train self.edge_model.backward(gen_loss, dis_loss) File "/content/edge-connect/src/models.py", line 149, in backward gen_loss.backward() File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Did anyone face with that problem? Thank you in advance

CyrilShch commented 4 years ago

solved by downgrading pytorch to 1.1.0

tadeasmtech commented 3 years ago

i solved the issue by switching two code sections in models.py, class EdgeModel:

def backward(self, gen_loss=None, dis_loss=None):
        if gen_loss is not None:  # gen_loss first, not the dis_loss
            gen_loss.backward()
        self.gen_optimizer.step()

        if dis_loss is not None:
            dis_loss.backward()
        self.dis_optimizer.step()

still dunno why it'd made a difference

hjf1997 commented 3 years ago

i solved the issue by switching two code sections in models.py, class EdgeModel:

def backward(self, gen_loss=None, dis_loss=None):
        if gen_loss is not None:  # gen_loss first, not the dis_loss
            gen_loss.backward()
        self.gen_optimizer.step()

        if dis_loss is not None:
            dis_loss.backward()
        self.dis_optimizer.step()

still dunno why it'd made a difference

Thanks for the answer~~ It helps me a lot

kdh4672 commented 3 years ago

i solved the issue by switching two code sections in models.py, class EdgeModel:

def backward(self, gen_loss=None, dis_loss=None):
        if gen_loss is not None:  # gen_loss first, not the dis_loss
            gen_loss.backward()
        self.gen_optimizer.step()

        if dis_loss is not None:
            dis_loss.backward()
        self.dis_optimizer.step()

still dunno why it'd made a difference

Wonderfull!

knazeri / edge-connect

Model configurations: