showlab / VisorGPT

[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
MIT License
132 stars 3 forks source link

Does VISORGPT supports to generate multiple instances with different sizes in an image? #4

Open zhangh0920 opened 1 year ago

zhangh0920 commented 1 year ago

I want to know if I want to generate objects with different sizes, such as a large building and lots of small windows in an image, can VISORGPT do it?

Sierkinhane commented 1 year ago

Thank you for your interest. VisorGPT can generate objects of different sizes, and the flag (small, medium, large) indicates the average area of all instances in one sample. Since the training data involves a limited set of annotated classes (not including all open-world objects), the current model may not have the ability to handle some open-world situations (some novel classes). This work primarily validates that visual priors can be learned through generative pre-training. We are actively working on enhancing VisorGPT to make it capable of handling open-world scenarios.