Hi, I understand OMG extracts attention map (like prompt-to-prompt [1] ) and character mask (by zero-shot segmentor) from the first stage image to realize customized generation. I wonder can we consider to combine OMG with null text inversion [2] to edit real images (just like prompt-to-prompt with null text inversion) ?
[1] Prompt-to-Prompt Image Editing with Cross Attention Control
[2] Null-text Inversion for Editing Real Images using Guided Diffusion Models
Hi, I understand OMG extracts attention map (like prompt-to-prompt [1] ) and character mask (by zero-shot segmentor) from the first stage image to realize customized generation. I wonder can we consider to combine OMG with null text inversion [2] to edit real images (just like prompt-to-prompt with null text inversion) ?
[1] Prompt-to-Prompt Image Editing with Cross Attention Control [2] Null-text Inversion for Editing Real Images using Guided Diffusion Models