knazeri / edge-connect

EdgeConnect: Structure Guided Image Inpainting using Edge Prediction, ICCV 2019 https://arxiv.org/abs/1901.00212
http://openaccess.thecvf.com/content_ICCVW_2019/html/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.html
Other
2.5k stars 528 forks source link

Training on Google Colab immediately stops #180

Open szymek1 opened 2 years ago

szymek1 commented 2 years ago

Hello, I'm constantly facing this issue: I try to train whichever model avaiable and it immediatley stops right after it started. I set up environment which I guess should be fine:

I set batch size to 1, because I thought that maybe there is a problem with to big batch size, as I have only 1 GPU. My guess is that on VM from colab the nviddia drivers, CUDA and cuDNN are much younger than what was used back in 2019. Nevertheless, here is my configuration as well as outcome. Please, help me guys!

MODE: 1 # 1: train, 2: test, 3: eval MODEL: 2 # 1: edge model, 2: inpaint model, 3: edge-inpaint model, 4: joint model MASK: 3 # 1: random block, 2: half, 3: external, 4: (external, random block), 5: (external, random block, half) EDGE: 1 # 1: canny, 2: external NMS: 1 # 0: no non-max-suppression, 1: applies non-max-suppression on the external edges by multiplying by Canny SEED: 10 # random seed GPU: [0] # list of gpu ids DEBUG: 0 # turns on debugging mode VERBOSE: 1 # turns on verbose mode in the output console

TRAIN_FLIST: xxxx VAL_FLIST: xxxx TEST_FLIST: xxxx

TRAIN_EDGE_FLIST: ./datasets/places2_edges_train.flist VAL_EDGE_FLIST: ./datasets/places2_edges_val.flist TEST_EDGE_FLIST: ./datasets/places2_edges_test.flist

TRAIN_MASK_FLIST: xxxx VAL_MASK_FLIST: xxxx TEST_MASK_FLIST: xxxx

LR: 0.001 # learning rate D2G_LR: 0.1 # discriminator/generator learning rate ratio BETA1: 0.0 # adam optimizer beta1 BETA2: 0.9 # adam optimizer beta2 BATCH_SIZE: 1 # input batch size for training INPUT_SIZE: 256 # input image size for training, 256 for original size SIGMA: 2 # standard deviation of the Gaussian filter used in Canny edge detector (0: random, -1: no edge) MAX_ITERS: 2 # maximum number of iterations to train the model

EDGE_THRESHOLD: 0.5 # edge detection threshold L1_LOSS_WEIGHT: 1 # l1 loss weight FM_LOSS_WEIGHT: 10 # feature-matching loss weight STYLE_LOSS_WEIGHT: 250 # style loss weight CONTENT_LOSS_WEIGHT: 0.1 # perceptual loss weight INPAINT_ADV_LOSS_WEIGHT: 0.1 # adversarial loss weight

GAN_LOSS: nsgan # nsgan | lsgan | hinge GAN_POOL_SIZE: 0 # fake images pool size

SAVE_INTERVAL: 2 # how many iterations to wait before saving model (0: never) SAMPLE_INTERVAL: 2 # how many iterations to wait before sampling (0: never) SAMPLE_SIZE: 24 # number of images to sample EVAL_INTERVAL: 2 # how many iterations to wait before model evaluation (0: never) LOG_INTERVAL: 1 # how many iterations to wait before logging training status (0: never)


start training...

Training epoch: 1

End training....

Gh1874 commented 2 years ago

Have you solved this problem? I think issue 54# have the same problem with you and you can check it out.

BTW, I'm trying to train on my own dataset as well. And I'm confused about the edge.flist (i.e. what you used in your config), I'm not sure which data should I use in each training stage. Could you please share some tips on it?