INFO:root:Loaded coca_ViT-L-14 model config.
INFO:root:Loading pretrained coca_ViT-L-14 weights (mscoco_finetuned_laion2B-s13B-b90k).
a close - up of the face of a siamese cat . modern disney styleclean, high-resolution, 8k
[Tiled VAE]: input_size: torch.Size([1, 3, 768, 1296]), tile_size: 1024, padding: 32
[Tiled VAE]: split to 1x2 = 2 tiles. Optimal tile size 640x704, original tile size 1024x1024
[Tiled VAE]: Executing Encoder Task Queue: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:01<00:00, 92.15it/s]
[Tiled VAE]: Done in 2.099s, max VRAM alloc 4155.224 MB
0%| | 0/20 [00:00<?, ?it/s]
mat1 and mat2 shapes cannot be multiplied (154x768 and 1280x320)
Is seems the featuer number of the text embedding is inconsistent with that of the unet. This happens mostly due to you used a mismatched text encoder or unet.
INFO:root:Loaded coca_ViT-L-14 model config. INFO:root:Loading pretrained coca_ViT-L-14 weights (mscoco_finetuned_laion2B-s13B-b90k). a close - up of the face of a siamese cat . modern disney styleclean, high-resolution, 8k [Tiled VAE]: input_size: torch.Size([1, 3, 768, 1296]), tile_size: 1024, padding: 32 [Tiled VAE]: split to 1x2 = 2 tiles. Optimal tile size 640x704, original tile size 1024x1024 [Tiled VAE]: Executing Encoder Task Queue: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:01<00:00, 92.15it/s] [Tiled VAE]: Done in 2.099s, max VRAM alloc 4155.224 MB 0%| | 0/20 [00:00<?, ?it/s] mat1 and mat2 shapes cannot be multiplied (154x768 and 1280x320)