I noticed that the Instruct pix2pix (controlnet version) through X-adapters looks like it understood the instruction much better than 1.5 (and produced significantly better image). Is this generally true, or was that a cherry-picked example?
Do you have more examples of Instruct pix2pix comparisons through your adapter vs base 1.5?
Incredible work on this research!
I noticed that the Instruct pix2pix (controlnet version) through X-adapters looks like it understood the instruction much better than 1.5 (and produced significantly better image). Is this generally true, or was that a cherry-picked example?
Do you have more examples of Instruct pix2pix comparisons through your adapter vs base 1.5?