yangxy / PASD

[ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
Apache License 2.0
889 stars 61 forks source link

Why do these things happen when I use Disney style transformations #57

Open Sillyyk opened 3 months ago

Sillyyk commented 3 months ago

INFO:root:Loaded coca_ViT-L-14 model config. INFO:root:Loading pretrained coca_ViT-L-14 weights (mscoco_finetuned_laion2B-s13B-b90k). a close - up of the face of a siamese cat . modern disney styleclean, high-resolution, 8k [Tiled VAE]: input_size: torch.Size([1, 3, 768, 1296]), tile_size: 1024, padding: 32 [Tiled VAE]: split to 1x2 = 2 tiles. Optimal tile size 640x704, original tile size 1024x1024 [Tiled VAE]: Executing Encoder Task Queue: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:01<00:00, 92.15it/s] [Tiled VAE]: Done in 2.099s, max VRAM alloc 4155.224 MB 0%| | 0/20 [00:00<?, ?it/s] mat1 and mat2 shapes cannot be multiplied (154x768 and 1280x320)

yangxy commented 2 months ago

Is seems the featuer number of the text embedding is inconsistent with that of the unet. This happens mostly due to you used a mismatched text encoder or unet.