ironjr / StreamMultiDiffusion

Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
https://jaerinlee.com/research/streammultidiffusion
MIT License
508 stars 40 forks source link

RPG support since it's using regional prompting? #12

Open andupotorac opened 1 month ago

andupotorac commented 1 month ago

This is a nice update to StreamDiffusion. I'm wondering if it can support the RPG-DiffusionMaster improvements, that are enabling high-fidelity images from the prompts. Right now it would be a compromise between using it with a lower speed, vs using a higher speed with StreamMultiDiffusion, but lower quality output in terms of prompt fidelity.

andupotorac commented 1 month ago

The reason why I'm asking is because I read in your doc you solved the regional prompting issue with the blending of the regions with quantized masks, and basically this same approach is likely to fix the issue with RPG too, as they're using this approach (and a visual LLM) to solve for better alignment in the output of the prompts.

ironjr commented 1 month ago

Sorry for the late reply. I just have read the RPG paper. Thanks for the nice suggestion. I will run their code and try adopting their pipeline into mine and see if this works. But this will take some time!

andupotorac commented 1 month ago

Thanks! We can try to do it and ask you questions here and there if you want, and then submit an update to your code if required. Would that be ok?

ironjr commented 1 month ago

Issues and pull requests are always welcome :)