Open jg10545 opened 5 years ago
On the same topic- I've been trying to decide how to interpret this section from Shetty et al's paper (emphasis added):
To obtain self-supervision to the in-painter we mask random rectangular patches mr from the input and ask GI to reconstruct these patches. We minimize the L1 loss and the perceptual loss [25] between the in-painted image and the input as follows: Lrecon(GI ) = kGI (mer· x) − xk1 + Σk kφk (GI (mer· x)) − φk(x)k1(6) Mask buffer. The masks generated by GM(x, ct) can be of arbitrary shape and hence the in-painter should be able to fill in arbitrary holes in the image. We find that the in-painter trained only on random rectangular masks performs poorly on masks generated by GM. However, we cannot simply train the in-painter with reconstruction loss in (6) on masks generated by GM.... We overcome this by storing generated masks from previous batches in a mask buffer and randomly applying them on images from the current batch.
Are they training the inpainter on random rectangular masks and stored masks from the buffer? Or one then the other? I can't tell.
I'm finding (again limited tests on synthetic data) that training seems more stable if I pretrain the inpainter a couple epochs on random rectangles, then (during the end-to-end training loop) use a mask buffer.
The more I think about this, the more convinced I am that the inpainter training probably should only happen on class-0 images (not containing objects to be removed). That way the decoder never has a chance to learn to paint them.
This should be an easy thing to try out.