2023 [SIGGRAPH] (DragGAN) Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Introduction

This paper provides an optimization-based approach to interactively edit the image generated by the pertained unconditional GAN. Specifically, user select the control points on the images and then use this approach to move these points to the target location. User can also specify the un-touched region by the mask.

Method

1. Motion Supervision: move the control points toward the target positions

$p_i$: control points, $t_i$: target points, $q_i$: the surrounding region of $p_i$, $d_i$ is the normal vector from $p_i$ to $t_i$
$F(q)$: the feature vector of the point $q$ from the 6th StyleGAN layers
(?) Optimize this loss to move $p_i$ to $p_i+d_i$

2. Point Tracking: figure out the current control point location**

Find the current control points by similarity of the feature map
The general optical flow estimation network(RAFT[^2], PIPS[^3]) can be used, but they will lead to the worse results

3. Repeat the above two steps several times until the control points close to the target positions.

Highlight

general method for the GAN
the movement can be accurate compare to the similar research UserContrallableLT[^1]
can be applied to real image through the GAN inversion

Limitation

slow due to the optimization approach
no explicit constraint to preserve the appearance although we can use mask to preserve the unchanged regions

Comments

[^1]: UserControllableLT: User-Controllable Latent Transformer for StyleGAN Image Layout Editing [^2]: RAFT - Recurrent All Pairs Field Transforms for Optical Flow [^3]: PIPS - Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

Introduction

This paper provides an optimization-based approach to interactively edit the image generated by the pertained unconditional GAN. Specifically, user select the control points on the images and then use this approach to move these points to the target location. User can also specify the un-touched region by the mask.

Method

### 1. Motion Supervision: move the control points toward the target positions

pi: control points, ti: target points, qi: the surrounding region of pi, di is the normal vector from pi to ti

F(q): the feature vector of the point q from the 6th StyleGAN layers

(?) Optimize this loss to move pi to pi+di

2. Point Tracking: figure out the current control point location**

Find the current control points by similarity of the feature map

The general optical flow estimation network(RAFT1, PIPS2) can be used, but they will lead to the worse results

3. Repeat the above two steps several times until the control points close to the target positions.

Highlight

general method for the GAN

the movement can be accurate compare to the similar research UserContrallableLT3

can be applied to real image through the GAN inversion

Limitation

slow due to the optimization approach

no explicit constraint to preserve the appearance although we can use mask to preserve the unchanged regions

Comments

Footnotes

RAFT - Recurrent All Pairs Field Transforms for Optical Flow ↩

PIPS - Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories ↩

UserControllableLT: User-Controllable Latent Transformer for StyleGAN Image Layout Editing ↩

Introduction

This paper provides an optimization-based approach to interactively edit the image generated by the pertained unconditional GAN. Specifically, user select the control points on the images and then use this approach to move these points to the target location. User can also specify the un-touched region by the mask.

Method

### 1. Motion Supervision: move the control points toward the target positions

pi: control points, ti: target points, qi: the surrounding region of pi, di is the normal vector from pi to ti

F(q): the feature vector of the point q from the 6th StyleGAN layers

(?) Optimize this loss to move pi to pi+di

2. Point Tracking: figure out the current control point location**

Find the current control points by similarity of the feature map

The general optical flow estimation network(RAFT1, PIPS2) can be used, but they will lead to the worse results

3. Repeat the above two steps several times until the control points close to the target positions.

Highlight

general method for the GAN

the movement can be accurate compare to the similar research UserContrallableLT3

can be applied to real image through the GAN inversion

Limitation

slow due to the optimization approach

no explicit constraint to preserve the appearance although we can use mask to preserve the unchanged regions

Comments

Footnotes

RAFT - Recurrent All Pairs Field Transforms for Optical Flow ↩

PIPS - Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories ↩

UserControllableLT: User-Controllable Latent Transformer for StyleGAN Image Layout Editing ↩

pomelyu / paper-reading-notes