Hi, i don't quite understand the inpainting training framework.
Is only the decoder need to be trained?
What does the "mock_image_embed" mean below? Is this embed need to be generated from clip and diffusion prior model?
inpainted_images = decoder.sample(
image_embed = mock_image_embed,
inpaint_image = inpaint_image, # just pass in the inpaint image
inpaint_mask = inpaint_mask # and the mask
)
Hi, i don't quite understand the inpainting training framework.