microsoft / i-Code

MIT License
1.68k stars 161 forks source link

Img2Img Broken? #116

Closed jacklishufan closed 9 months ago

jacklishufan commented 1 year ago

Hi thanks for the great work. While this is not explored in paper, but since the conditions are aligned, in principle it should be possible to perform img2img task to encode an image and decode the latent to a similar image (at least in some aspect)? However simple experiments show this fails

image

I was expecting it to behave like DALLE-2's UNCLIP decoder. Any idea why this does not work? Are you applying some special attention map so that one modality do not attend to inputs of same modality?

zinengtang commented 1 year ago

I think this might be the issue for transformer version. What version you are currently using. Did you run pip install -r requirement.txt?