Train other datasets - Githubissues

knazeri / edge-connect

EdgeConnect: Structure Guided Image Inpainting using Edge Prediction, ICCV 2019 https://arxiv.org/abs/1901.00212

http://openaccess.thecvf.com/content_ICCVW_2019/html/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.html

Other

2.52k stars 532 forks source link

Train other datasets #79

Open superior1993 opened 5 years ago

superior1993 commented 5 years ago

Hello, I want to ask you some questions. I want to train on Pascal voc 2012, how to generate the corresponding pascal_edges_val.flist file.Or can I train on Pascal voc 2012? What changes need to be made? thanks.

knazeri commented 5 years ago

@superior1993 No changes are required. You just create training/test/validation set by using the scripts/flist.py script. Make sure to follow the instructions provided when training and whenever possible use the pre-trained weights. The edges.flist is only used when you are using an external library to create edges. If you wish to use Canny edge detector (the default) you won't be needing to generate corresponding edges.

superior1993 commented 5 years ago

Thank you very much for your excellent work. There is a problem with the training. I have modified the 'MODEL':4 and 'MAX_ITERS':500000 in the config.py.But in the terminal input command: python train.py --model 4 -- checkpoint ./checkpoints/pascal, the display configuration and modification are inconsistent, and the program reports an error. RuntimeError: CUDA out of memory.

knazeri commented 5 years ago

@superior1993 I think your batch size of 15 is too large for your GPU memory. Try reducing the batch size.

superior1993 commented 5 years ago

In fact, I have already changed the batch_size to 1 in config.py and still report an error. Why is the configuration information printed by the terminal inconsistent with my modification? I trained the first stage edge model to set the batch_size to be 8, and the second stage inpaint model to set the batch_size to be 16, which will affect the third stage joint model? Thank you

superior1993 commented 5 years ago

I modified config.yml and no longer reported an error. In fact, I found an error. In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. Finally, I want to ask you some questions. If I run models 1 and 2 on train.py, do I need to run models 3 and 4 again? In fact, I am not very clear about the specific models 3 and 4. Could you please explain it? Thank you.

12ycli commented 4 years ago

I have also met the same problem with you @superior1993 . In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. After I changed the batcj size to 2, then I got RuntimeError: reduce failed to synchronize: device-side assert triggered.

cpatrickalves commented 4 years ago

I modified config.yml and no longer reported an error. In fact, I found an error. In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. Finally, I want to ask you some questions. If I run models 1 and 2 on train.py, do I need to run models 3 and 4 again? In fact, I am not very clear about the specific models 3 and 4. Could you please explain it? Thank you.

@knazeri would you mind to explain the differences between models 3 and 4? When should I use 3 and 4? @12ycli have you found an explanation?