raywzy / ICT

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)
309 stars 43 forks source link

Why not end-to-end network? #22

Closed Monalissaa closed 2 years ago

Monalissaa commented 2 years ago

Thank you for proposing this good idea of using Transformer as a priori information! But why not use an end-to-end network for training, is it because the effect is not good?

raywzy commented 2 years ago

Great question! There are two reasons I think:

  1. The output of transformer contains diversity, we don't have the GT for each sampled low-resolution images.
  2. The sampling process is non-differentiable.
Monalissaa commented 2 years ago

Understood, thanks again.