songweige / rich-text-to-image

Rich-Text-to-Image Generation
https://rich-text-to-image.github.io/
MIT License
764 stars 65 forks source link

could creat a image condition by extended thinking? #19

Open se7enth opened 11 months ago

se7enth commented 11 months ago

Rich text markup is a user-friendly approach. I wonder if we can use a similar method to help SD understand spatial relationships in images. In simple terms, it involves annotating elements in 3D software to match points and generating a specific format image that includes annotations and spatial position transformed into image area masks. This would be somewhat similar to ControlNet segmentation, but I believe it should be much more user-friendly and capable of handling complex spatial relationship definitions.We can explain spatial relationships by laying out simple basic geometry, or make quick annotations on an existing model, as a material, perhaps a blender plugin is needed to output the image conditions