Open zhangh0920 opened 1 year ago
Thank you for your interest. VisorGPT can generate objects of different sizes, and the flag (small, medium, large) indicates the average area of all instances in one sample. Since the training data involves a limited set of annotated classes (not including all open-world objects), the current model may not have the ability to handle some open-world situations (some novel classes). This work primarily validates that visual priors can be learned through generative pre-training. We are actively working on enhancing VisorGPT to make it capable of handling open-world scenarios.
I want to know if I want to generate objects with different sizes, such as a large building and lots of small windows in an image, can VISORGPT do it?