Thank you for your awesome work.
But I run the test.py found a error:
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1024 but got size 768 for tensor number 1 in the list.
It seems the shape of results of clip encoder and vae encoder are wrong.
How can I fix it?
"
def forward(self, clip, vae):
clip (1 257 1024)
vae = self.pool(vae) # 1 4 80 64 --> 1 4 40 32
vae = rearrange(vae, 'b c h w -> b c (h w)') # 1 4 40 32 --> 1 4 1280
vae = self.vae2clip(vae) # 1 4 768
# Concatenate them is difficult
concat = torch.cat((clip, vae), 1)
Thank you for your awesome work. But I run the test.py found a error: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1024 but got size 768 for tensor number 1 in the list. It seems the shape of results of clip encoder and vae encoder are wrong. How can I fix it?
" def forward(self, clip, vae):
clip (1 257 1024)
"