Open superior1993 opened 5 years ago
@superior1993 No changes are required. You just create training/test/validation set by using the scripts/flist.py
script. Make sure to follow the instructions provided when training and whenever possible use the pre-trained weights.
The edges.flist is only used when you are using an external library to create edges. If you wish to use Canny edge detector (the default) you won't be needing to generate corresponding edges.
Thank you very much for your excellent work. There is a problem with the training. I have modified the 'MODEL':4 and 'MAX_ITERS':500000 in the config.py.But in the terminal input command: python train.py --model 4 -- checkpoint ./checkpoints/pascal, the display configuration and modification are inconsistent, and the program reports an error. RuntimeError: CUDA out of memory.
@superior1993 I think your batch size of 15 is too large for your GPU memory. Try reducing the batch size.
In fact, I have already changed the batch_size to 1 in config.py and still report an error. Why is the configuration information printed by the terminal inconsistent with my modification? I trained the first stage edge model to set the batch_size to be 8, and the second stage inpaint model to set the batch_size to be 16, which will affect the third stage joint model? Thank you
I modified config.yml and no longer reported an error. In fact, I found an error. In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. Finally, I want to ask you some questions. If I run models 1 and 2 on train.py, do I need to run models 3 and 4 again? In fact, I am not very clear about the specific models 3 and 4. Could you please explain it? Thank you.
I have also met the same problem with you @superior1993 . In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. After I changed the batcj size to 2, then I got RuntimeError: reduce failed to synchronize: device-side assert triggered.
I modified config.yml and no longer reported an error. In fact, I found an error. In model=4, I need to modify dis_loss.backward(retain_graph = True) in models.py , line 256. Otherwise, RuntimeError: Trying to backward through the graph a second time. Finally, I want to ask you some questions. If I run models 1 and 2 on train.py, do I need to run models 3 and 4 again? In fact, I am not very clear about the specific models 3 and 4. Could you please explain it? Thank you.
@knazeri would you mind to explain the differences between models 3 and 4? When should I use 3 and 4? @12ycli have you found an explanation?
Hello, I want to ask you some questions. I want to train on Pascal voc 2012, how to generate the corresponding pascal_edges_val.flist file.Or can I train on Pascal voc 2012? What changes need to be made? thanks.