quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
338 stars 45 forks source link

[Feature Request] 16:9 ratio input image for controlnet #25

Open kristoftunner opened 3 months ago

kristoftunner commented 3 months ago

I would like to use the controlnet(https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/controlnet_quantized/README.md) model pipeline with a 16:9 resolution input image, but currently the image resolution of this pipeline is fixed to 512x512

Describe the solution you'd like Either having a variable input/output image WxH would be the best solution, but a fixed resolution model would make it also, for example 960x540

Describe alternatives you've considered I considered resizing the input 16:9 image, but that distorts the image unfortunately.

mestrona-3 commented 3 months ago

Hi @kristoftunner, than you for the feature request! We've filed this internally and will comment here when we have an update to share.

kristoftunner commented 2 months ago

@mestrona-3 any updates on this feature?

bhushan23 commented 2 months ago

@kristoftunner These are pre-compiled models and since output is 512x512 resizing it to 16:9 will indeed distort image.

We were hands down on some of the other models and will get to releasing PyTorch recipes for Stable Diffusion and ControlNet. But we don't have an ETA on this at this moment. We will update on this thread in next few weeks.

kristoftunner commented 2 months ago

thanks @bhushan23 !