Hi, thanks for your contribution to this awesome work! We also study on this topic to leverage LLM planning to generate images with complex scene. The paper is here Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following (CVPR 2024) with open-sourced code . Furthermore, we take more efforts to use it for modifying the generated image with progressive adjustments or chatting-based editing (also discussed in this repo). We hope our exploration could help to further develop the editing function in this repository, e.g. maintaining ID-consistency, understanding complex editing instructions with LLM.
Hi, thanks for your contribution to this awesome work! We also study on this topic to leverage LLM planning to generate images with complex scene. The paper is here Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following (CVPR 2024) with open-sourced code . Furthermore, we take more efforts to use it for modifying the generated image with progressive adjustments or chatting-based editing (also discussed in this repo). We hope our exploration could help to further develop the editing function in this repository, e.g. maintaining ID-consistency, understanding complex editing instructions with LLM.
Our framework is illustrated as follows: