magic-research / InstaDrag

Experiencing lightning fast (~1s) and accurate drag-based image editing
250 stars 11 forks source link

Some questions about the paper #6

Closed TomSuen closed 2 weeks ago

TomSuen commented 1 month ago

Hi, I have new question about Once obtaining the handle and target point maps, we encode them into embedding via a point embedding network, which is composed of 12 layers of convolution and SiLU activation. This network outputs embedding at four different resolutions, corresponding to the four different resolutions of SD UNet activation maps.

  1. What are the exact dimensions of this network?
  2. How to initialize its parameters?
  3. Are P_hdl and P_tgt passed in the network separately?
  4. At which positions are the 4 embeddings output, respectively?
liewjunhao commented 2 weeks ago

Thank you for your patience. Code and pre-trained models are now available here. Please note that this project is now called LightningDrag and this repo will be removed soon.

TomSuen commented 2 weeks ago

Thank you for your patience. Code and pre-trained models are now available here. Please note that this project is now called LightningDrag and this repo will be removed soon.

That's a good news for me!