Open Millu opened 9 months ago
Some SDXL models strongly recommend (almost require) setting clip skip, so this would be a very handy addition to the ui/graph
@hipsterusername @psychedelicious Should this be reopened since the changes that implemented this have been reverted?
I’ll reopen it, but it’s not obvious we will implement this given that the results for clip skip + sdxl seem non-optimal
Hi @hipsterusername ! What do you mean by results for clip skip + sdxl seem not-optimal? Pony XL, one of the most popular sdxl checkpoints at the moment explicitely requires clip skip: Make sure you load this model with clip skip 2 (or -2 in some software), otherwise you will be getting low quality blobs.
Hi @hipsterusername ! What do you mean by results for clip skip + sdxl seem not-optimal? Pony XL, one of the most popular sdxl checkpoints at the moment explicitely requires clip skip: Make sure you load this model with clip skip 2 (or -2 in some software), otherwise you will be getting low quality blobs.
As tested by multiple members of the community, this is seen as already correctly configured by default in Invoke, and no exposed SDXL clip skip setting is needed. The instructions in the UI aren’t for Invoke, and our clip skip setting is offset by one relative to other UIs.
As tested by multiple members of the community,
Is there a written report with images I can read somewhere?
this is seen as already correctly configured by default in Invoke, and no exposed SDXL clip skip setting is needed.
You mean that, contradicting all the model trainers recommending specific clip skip values, the invoke community considers the hardcoded invoke clip skip to be the universally optimal setting?
The instructions in the UI aren’t for Invoke
I don't understand. What are the instructions for?
and our clip skip setting is offset by one relative to other UIs.
So you mean there is a hardcoded clip skip of 1 (or -1 relative to the last layer value) in invoke whereas in the other UIs, the default is the model config's original value?
All in all, regardless, not exposing the clip skip setting in the UI for the user to decide, especially since there is so much variability in the recommendations, especially since all the more popular UIs are supporting the user's choice in the matter, especially for an artistic tool, doesn't seem to me like good user experience design.
As tested by multiple members of the community,
Is there a written report with images I can read somewhere?
You can see a couple of extended threads in Discord.
this is seen as already correctly configured by default in Invoke, and no exposed SDXL clip skip setting is needed.
You mean that, contradicting all the model trainers recommending specific clip skip values, the invoke community considers the hardcoded invoke clip skip to be the universally optimal setting?
Yes.
The instructions in the UI aren’t for Invoke
I don't understand. What are the instructions for?
They are typically instructions regarding a specific setting in A1111 or Comfy, depending on the instructions. There isn't a standardized set of terms, settings, or useful values.
and our clip skip setting is offset by one relative to other UIs.
So you mean there is a hardcoded clip skip of 1 (or -1 relative to the last layer value) in invoke whereas in the other UIs, the default is the model config's original value?
Moreso that Invoke starts counting at a different layer, as an inheritance of how the Diffusers library handles that. But yes, effectively the number of skipped layers differs between Invoke and other systems.
All in all, regardless, not exposing the clip skip setting in the UI for the user to decide, especially since there is so much variability in the recommendations, especially since all the more popular UIs are supporting the user's choice in the matter, especially for an artistic tool, doesn't seem to me like good user experience design.
Your feedback is noted - If you can provide evidence that you find consistently superior results with SDXL Clip Skip (which can be manually created in Workflows for testing), we'd be happy to explore adding it in. As for now, we've found little to no evidence it's widely useful in the SDXL fine-tune landscape, and as a result, would be clutter at best.
Our stance on UX design is that we only add useful features that will be durable in the long-term and worth maintaining, based on the professional creatives using the product.
If you can provide evidence that you find consistently superior results with SDXL Clip Skip (which can be manually created in Workflows for testing)
If I can provide evidence, what's next?
I cannot test Pony Diffusion XL v6 in invoke because it crashes (server value or something...). But it doesn't matter because I can't change the clip skip setting either.
So here is the workflow configuration I have in ComfyUI:
force-fp16: true model: ponyDiffusionV6XL_v6StartWithThisOne.safetensors vae: sdxl_vae.safetensors (from the pony difusion xl v6 download page on civitai) prompt: score_9, (rating_safe), 2d, art, cat, female, warriors cats accurate, sit pose, feral, siamese cat, fluffy, blue eyes width: 1024 height: 1024 seed: 1127646817 steps: 25 cfg: 7.5 sampler_name: euler_ancestral scheduler: karras denoise: 1.0
I'm going to let you guess which of the two images below is clip skip -1 and clip skip -2....
I wanted to try again in Invoke to get the exact error message I was getting and it's indeed "Server error, Value error" and in the terminal, I see this:
[2024-06-03 22:33:48,109]::[InvokeAI]::ERROR --> Error while invoking session 37070faf-f2c8-4a50-bea2-5176ec71eb40, invocation 235f88c0-a4cb-4b82-92d3-61118f1e04bb (l2i):
Cannot load <class 'diffusers.models.autoencoders.autoencoder_kl.AutoencoderKL'> from /home/user/Downloads/sd-models/ponyDiffusionV6XL-vae because the following keys are missing:
decoder.conv_norm_out.weight, encoder.down_blocks.1.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.0.norm2.weight, decoder.up_blocks.2.resnets.2.conv1.bias, decoder.up_blocks.1.resnets.1.conv2.weight, decoder.up_blocks.2.resnets.2.conv1.weight, decoder.mid_block.attentions.0.group_norm.bias, encoder.down_blocks.3.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.norm2.weight, decoder.mid_block.resnets.0.norm1.bias, decoder.up_blocks.3.resnets.2.norm2.bias, decoder.mid_block.attentions.0.to_v.weight, decoder.up_blocks.0.resnets.0.conv1.weight, encoder.mid_block.attentions.0.to_out.0.bias, encoder.down_blocks.2.downsamplers.0.conv.bias, encoder.mid_block.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.0.norm1.weight, encoder.mid_block.attentions.0.to_q.weight, decoder.mid_block.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.0.conv1.bias, decoder.up_blocks.0.resnets.2.norm1.bias, encoder.mid_block.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.conv2.weight, decoder.up_blocks.3.resnets.0.conv2.bias, decoder.mid_block.resnets.0.conv2.bias, decoder.mid_block.attentions.0.to_q.weight, decoder.up_blocks.1.resnets.2.conv2.bias, decoder.mid_block.attentions.0.to_k.weight, decoder.up_blocks.2.resnets.1.norm1.bias, encoder.mid_block.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.2.norm1.bias, encoder.down_blocks.2.resnets.0.norm1.bias, decoder.up_blocks.0.resnets.0.norm2.bias, decoder.up_blocks.0.resnets.1.conv1.weight, decoder.up_blocks.0.resnets.2.norm2.weight, decoder.up_blocks.2.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.0.conv2.bias, encoder.down_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.0.resnets.2.conv2.bias, encoder.mid_block.resnets.0.norm1.weight, encoder.down_blocks.2.resnets.1.conv2.bias, encoder.down_blocks.1.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.1.conv2.weight, encoder.mid_block.resnets.1.norm1.bias, encoder.down_blocks.3.resnets.0.norm1.bias, encoder.down_blocks.2.resnets.0.conv1.weight, encoder.down_blocks.3.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.1.norm2.weight, decoder.up_blocks.2.upsamplers.0.conv.bias, encoder.down_blocks.0.downsamplers.0.conv.weight, encoder.conv_norm_out.bias, decoder.up_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.1.resnets.0.norm1.weight, encoder.mid_block.resnets.0.norm2.bias, decoder.up_blocks.3.resnets.1.conv1.weight, decoder.up_blocks.1.resnets.2.conv2.weight, decoder.up_blocks.3.resnets.2.conv2.bias, encoder.down_blocks.1.resnets.1.conv2.weight, encoder.down_blocks.2.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.1.norm1.bias, decoder.up_blocks.2.resnets.0.conv_shortcut.weight, decoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.2.conv2.weight, decoder.up_blocks.1.resnets.2.norm1.weight, decoder.up_blocks.2.resnets.2.norm2.weight, decoder.up_blocks.1.resnets.1.norm2.weight, encoder.mid_block.attentions.0.to_v.weight, encoder.down_blocks.3.resnets.1.norm2.bias, decoder.up_blocks.3.resnets.0.conv_shortcut.bias, decoder.mid_block.resnets.0.conv2.weight, decoder.up_blocks.0.resnets.2.conv1.weight, decoder.up_blocks.2.resnets.0.norm1.weight, decoder.mid_block.resnets.1.conv2.bias, encoder.down_blocks.3.resnets.0.conv2.bias, decoder.conv_norm_out.bias, decoder.up_blocks.0.upsamplers.0.conv.weight, encoder.down_blocks.1.resnets.0.conv1.bias, encoder.down_blocks.1.resnets.0.conv_shortcut.weight, encoder.down_blocks.1.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.conv1.weight, encoder.mid_block.resnets.0.conv2.weight, encoder.mid_block.attentions.0.to_q.bias, decoder.up_blocks.1.resnets.0.conv2.bias, decoder.mid_block.resnets.0.conv1.bias, decoder.mid_block.attentions.0.to_out.0.bias, decoder.up_blocks.3.resnets.0.norm2.weight, encoder.down_blocks.0.resnets.1.conv2.bias, encoder.down_blocks.0.resnets.1.conv2.weight, encoder.down_blocks.2.resnets.0.conv1.bias, decoder.up_blocks.1.resnets.2.conv1.weight, decoder.up_blocks.2.upsamplers.0.conv.weight, encoder.mid_block.resnets.1.conv2.bias, decoder.mid_block.attentions.0.to_k.bias, encoder.conv_norm_out.weight, decoder.up_blocks.2.resnets.2.norm1.weight, decoder.up_blocks.3.resnets.2.norm1.weight, encoder.down_blocks.3.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.1.norm1.bias, decoder.up_blocks.0.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.2.norm2.bias, decoder.mid_block.attentions.0.to_out.0.weight, decoder.up_blocks.0.resnets.0.conv2.bias, decoder.mid_block.attentions.0.to_q.bias, encoder.down_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.2.conv1.weight, encoder.mid_block.resnets.0.conv2.bias, decoder.up_blocks.0.resnets.0.norm1.bias, encoder.down_blocks.1.resnets.0.conv_shortcut.bias, decoder.mid_block.resnets.0.norm2.bias, decoder.mid_block.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.2.resnets.2.norm2.bias, encoder.down_blocks.0.resnets.1.norm2.bias, decoder.mid_block.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.1.norm1.bias, encoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.1.norm2.bias, decoder.up_blocks.0.resnets.2.conv1.bias, decoder.up_blocks.1.resnets.2.conv1.bias, encoder.mid_block.attentions.0.to_out.0.weight, decoder.up_blocks.2.resnets.0.conv1.bias, encoder.down_blocks.2.resnets.0.conv_shortcut.weight, decoder.mid_block.resnets.0.conv1.weight, encoder.down_blocks.3.resnets.1.norm2.weight, decoder.up_blocks.1.upsamplers.0.conv.bias, encoder.mid_block.resnets.0.conv1.weight, decoder.up_blocks.1.resnets.0.norm2.weight, encoder.mid_block.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.1.conv1.weight, decoder.up_blocks.1.resnets.0.norm2.bias, decoder.up_blocks.1.upsamplers.0.conv.weight, decoder.up_blocks.2.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.0.conv_shortcut.weight, encoder.mid_block.attentions.0.to_v.bias, decoder.up_blocks.1.resnets.0.conv1.bias, encoder.down_blocks.1.downsamplers.0.conv.bias, decoder.up_blocks.3.resnets.0.conv1.bias, decoder.up_blocks.1.resnets.1.norm2.bias, decoder.up_blocks.0.resnets.0.norm2.weight, decoder.up_blocks.1.resnets.0.conv1.weight, encoder.mid_block.resnets.1.norm2.weight, encoder.down_blocks.3.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.1.norm1.bias, encoder.mid_block.attentions.0.to_k.weight, decoder.up_blocks.3.resnets.2.norm2.weight, encoder.down_blocks.0.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.0.norm2.bias, encoder.mid_block.resnets.1.conv2.weight, decoder.mid_block.attentions.0.to_v.bias, decoder.up_blocks.1.resnets.1.norm1.weight, decoder.up_blocks.1.resnets.2.norm2.bias, encoder.down_blocks.1.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.conv1.bias, decoder.up_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.3.resnets.0.norm2.bias, encoder.mid_block.attentions.0.group_norm.bias, encoder.down_blocks.2.resnets.0.norm2.weight, encoder.down_blocks.0.resnets.0.norm2.weight, encoder.down_blocks.3.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.0.conv2.weight, encoder.down_blocks.1.resnets.1.norm2.weight, encoder.down_blocks.0.resnets.1.norm1.weight, encoder.mid_block.attentions.0.group_norm.weight, encoder.down_blocks.0.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.1.norm1.bias, decoder.up_blocks.0.resnets.0.norm1.weight, encoder.down_blocks.3.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.0.norm1.bias, decoder.up_blocks.0.upsamplers.0.conv.bias, encoder.down_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.2.conv2.weight, decoder.mid_block.attentions.0.group_norm.weight, encoder.down_blocks.1.resnets.0.conv2.bias, decoder.up_blocks.0.resnets.1.conv2.weight, encoder.down_blocks.3.resnets.1.conv1.bias, encoder.down_blocks.3.resnets.0.conv1.weight, decoder.mid_block.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.0.norm2.bias, encoder.down_blocks.2.resnets.1.conv2.weight, encoder.mid_block.attentions.0.to_k.bias, decoder.mid_block.resnets.1.conv1.weight, decoder.up_blocks.2.resnets.2.conv2.weight, encoder.down_blocks.3.resnets.1.conv2.weight, encoder.down_blocks.1.resnets.0.norm2.bias, encoder.down_blocks.1.resnets.1.conv2.bias, encoder.down_blocks.1.downsamplers.0.conv.weight, decoder.up_blocks.1.resnets.1.conv2.bias, encoder.down_blocks.1.resnets.0.conv1.weight, encoder.down_blocks.1.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.conv_shortcut.bias, decoder.up_blocks.0.resnets.2.norm1.weight, decoder.up_blocks.3.resnets.0.norm1.weight, encoder.mid_block.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.2.conv2.bias, decoder.mid_block.resnets.1.norm2.weight, encoder.mid_block.resnets.0.norm1.bias, decoder.up_blocks.3.resnets.1.norm2.weight, encoder.down_blocks.0.resnets.1.conv1.weight, decoder.up_blocks.2.resnets.0.norm2.bias, encoder.down_blocks.0.resnets.1.norm2.weight, encoder.down_blocks.2.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.0.resnets.1.norm1.weight, encoder.down_blocks.2.resnets.0.conv2.bias, decoder.up_blocks.3.resnets.1.conv2.bias, decoder.up_blocks.1.resnets.1.conv1.weight, decoder.mid_block.resnets.0.norm1.weight, decoder.up_blocks.2.resnets.0.conv2.bias, decoder.up_blocks.3.resnets.2.conv1.bias, decoder.mid_block.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.2.norm1.bias, decoder.up_blocks.3.resnets.0.conv1.weight, decoder.up_blocks.2.resnets.1.norm2.bias, decoder.up_blocks.1.resnets.2.norm1.bias, encoder.down_blocks.0.resnets.0.norm1.bias, decoder.up_blocks.1.resnets.2.norm2.weight, encoder.down_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.conv1.weight, decoder.up_blocks.2.resnets.1.conv1.bias, encoder.down_blocks.2.downsamplers.0.conv.weight, encoder.down_blocks.0.downsamplers.0.conv.bias, encoder.down_blocks.2.resnets.1.norm2.weight, decoder.up_blocks.3.resnets.1.conv1.bias, encoder.down_blocks.0.resnets.0.conv1.weight, encoder.down_blocks.2.resnets.0.conv_shortcut.bias, encoder.down_blocks.3.resnets.0.norm2.bias, encoder.down_blocks.3.resnets.0.norm2.weight.
Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
[2024-06-03 22:33:48,109]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/app/services/session_processor/session_processor_default.py", line 185, in _process
outputs = self._invocation.invoke_internal(
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/app/invocations/baseinvocation.py", line 289, in invoke_internal
return self.invoke(context)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/app/invocations/latent.py", line 1040, in invoke
vae_info = context.models.load(self.vae.vae)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/app/services/shared/invocation_context.py", line 360, in load
return self._services.model_manager.load.load_model(model, _submodel_type, self._data)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/app/services/model_load/model_load_default.py", line 80, in load_model
).load_model(model_config, submodel_type)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/backend/model_manager/load/load_default.py", line 62, in load_model
locker = self._convert_and_load(model_config, model_path, submodel_type)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/backend/model_manager/load/load_default.py", line 92, in _convert_and_load
loaded_model = self._load_model(config, submodel_type)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/invokeai/backend/model_manager/load/model_loaders/generic_diffusers.py", line 42, in _load_model
result: AnyModel = model_class.from_pretrained(model_path, torch_dtype=self._torch_dtype, variant=variant)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/home/user/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 660, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'diffusers.models.autoencoders.autoencoder_kl.AutoencoderKL'> from /home/user/Downloads/sd-models/ponyDiffusionV6XL-vae because the following keys are missing:
decoder.conv_norm_out.weight, encoder.down_blocks.1.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.0.norm2.weight, decoder.up_blocks.2.resnets.2.conv1.bias, decoder.up_blocks.1.resnets.1.conv2.weight, decoder.up_blocks.2.resnets.2.conv1.weight, decoder.mid_block.attentions.0.group_norm.bias, encoder.down_blocks.3.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.norm2.weight, decoder.mid_block.resnets.0.norm1.bias, decoder.up_blocks.3.resnets.2.norm2.bias, decoder.mid_block.attentions.0.to_v.weight, decoder.up_blocks.0.resnets.0.conv1.weight, encoder.mid_block.attentions.0.to_out.0.bias, encoder.down_blocks.2.downsamplers.0.conv.bias, encoder.mid_block.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.0.norm1.weight, encoder.mid_block.attentions.0.to_q.weight, decoder.mid_block.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.0.conv1.bias, decoder.up_blocks.0.resnets.2.norm1.bias, encoder.mid_block.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.conv2.weight, decoder.up_blocks.3.resnets.0.conv2.bias, decoder.mid_block.resnets.0.conv2.bias, decoder.mid_block.attentions.0.to_q.weight, decoder.up_blocks.1.resnets.2.conv2.bias, decoder.mid_block.attentions.0.to_k.weight, decoder.up_blocks.2.resnets.1.norm1.bias, encoder.mid_block.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.2.norm1.bias, encoder.down_blocks.2.resnets.0.norm1.bias, decoder.up_blocks.0.resnets.0.norm2.bias, decoder.up_blocks.0.resnets.1.conv1.weight, decoder.up_blocks.0.resnets.2.norm2.weight, decoder.up_blocks.2.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.0.conv2.bias, encoder.down_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.0.resnets.2.conv2.bias, encoder.mid_block.resnets.0.norm1.weight, encoder.down_blocks.2.resnets.1.conv2.bias, encoder.down_blocks.1.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.1.conv2.weight, encoder.mid_block.resnets.1.norm1.bias, encoder.down_blocks.3.resnets.0.norm1.bias, encoder.down_blocks.2.resnets.0.conv1.weight, encoder.down_blocks.3.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.1.norm2.weight, decoder.up_blocks.2.upsamplers.0.conv.bias, encoder.down_blocks.0.downsamplers.0.conv.weight, encoder.conv_norm_out.bias, decoder.up_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.1.resnets.0.norm1.weight, encoder.mid_block.resnets.0.norm2.bias, decoder.up_blocks.3.resnets.1.conv1.weight, decoder.up_blocks.1.resnets.2.conv2.weight, decoder.up_blocks.3.resnets.2.conv2.bias, encoder.down_blocks.1.resnets.1.conv2.weight, encoder.down_blocks.2.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.1.norm1.bias, decoder.up_blocks.2.resnets.0.conv_shortcut.weight, decoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.2.conv2.weight, decoder.up_blocks.1.resnets.2.norm1.weight, decoder.up_blocks.2.resnets.2.norm2.weight, decoder.up_blocks.1.resnets.1.norm2.weight, encoder.mid_block.attentions.0.to_v.weight, encoder.down_blocks.3.resnets.1.norm2.bias, decoder.up_blocks.3.resnets.0.conv_shortcut.bias, decoder.mid_block.resnets.0.conv2.weight, decoder.up_blocks.0.resnets.2.conv1.weight, decoder.up_blocks.2.resnets.0.norm1.weight, decoder.mid_block.resnets.1.conv2.bias, encoder.down_blocks.3.resnets.0.conv2.bias, decoder.conv_norm_out.bias, decoder.up_blocks.0.upsamplers.0.conv.weight, encoder.down_blocks.1.resnets.0.conv1.bias, encoder.down_blocks.1.resnets.0.conv_shortcut.weight, encoder.down_blocks.1.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.conv1.weight, encoder.mid_block.resnets.0.conv2.weight, encoder.mid_block.attentions.0.to_q.bias, decoder.up_blocks.1.resnets.0.conv2.bias, decoder.mid_block.resnets.0.conv1.bias, decoder.mid_block.attentions.0.to_out.0.bias, decoder.up_blocks.3.resnets.0.norm2.weight, encoder.down_blocks.0.resnets.1.conv2.bias, encoder.down_blocks.0.resnets.1.conv2.weight, encoder.down_blocks.2.resnets.0.conv1.bias, decoder.up_blocks.1.resnets.2.conv1.weight, decoder.up_blocks.2.upsamplers.0.conv.weight, encoder.mid_block.resnets.1.conv2.bias, decoder.mid_block.attentions.0.to_k.bias, encoder.conv_norm_out.weight, decoder.up_blocks.2.resnets.2.norm1.weight, decoder.up_blocks.3.resnets.2.norm1.weight, encoder.down_blocks.3.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.1.norm1.bias, decoder.up_blocks.0.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.2.norm2.bias, decoder.mid_block.attentions.0.to_out.0.weight, decoder.up_blocks.0.resnets.0.conv2.bias, decoder.mid_block.attentions.0.to_q.bias, encoder.down_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.2.conv1.weight, encoder.mid_block.resnets.0.conv2.bias, decoder.up_blocks.0.resnets.0.norm1.bias, encoder.down_blocks.1.resnets.0.conv_shortcut.bias, decoder.mid_block.resnets.0.norm2.bias, decoder.mid_block.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.2.resnets.2.norm2.bias, encoder.down_blocks.0.resnets.1.norm2.bias, decoder.mid_block.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.1.norm1.bias, encoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.1.norm2.bias, decoder.up_blocks.0.resnets.2.conv1.bias, decoder.up_blocks.1.resnets.2.conv1.bias, encoder.mid_block.attentions.0.to_out.0.weight, decoder.up_blocks.2.resnets.0.conv1.bias, encoder.down_blocks.2.resnets.0.conv_shortcut.weight, decoder.mid_block.resnets.0.conv1.weight, encoder.down_blocks.3.resnets.1.norm2.weight, decoder.up_blocks.1.upsamplers.0.conv.bias, encoder.mid_block.resnets.0.conv1.weight, decoder.up_blocks.1.resnets.0.norm2.weight, encoder.mid_block.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.1.conv1.weight, decoder.up_blocks.1.resnets.0.norm2.bias, decoder.up_blocks.1.upsamplers.0.conv.weight, decoder.up_blocks.2.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.0.conv_shortcut.weight, encoder.mid_block.attentions.0.to_v.bias, decoder.up_blocks.1.resnets.0.conv1.bias, encoder.down_blocks.1.downsamplers.0.conv.bias, decoder.up_blocks.3.resnets.0.conv1.bias, decoder.up_blocks.1.resnets.1.norm2.bias, decoder.up_blocks.0.resnets.0.norm2.weight, decoder.up_blocks.1.resnets.0.conv1.weight, encoder.mid_block.resnets.1.norm2.weight, encoder.down_blocks.3.resnets.1.conv1.weight, decoder.up_blocks.3.resnets.1.norm1.bias, encoder.mid_block.attentions.0.to_k.weight, decoder.up_blocks.3.resnets.2.norm2.weight, encoder.down_blocks.0.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.0.norm2.bias, encoder.mid_block.resnets.1.conv2.weight, decoder.mid_block.attentions.0.to_v.bias, decoder.up_blocks.1.resnets.1.norm1.weight, decoder.up_blocks.1.resnets.2.norm2.bias, encoder.down_blocks.1.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.1.conv1.bias, decoder.up_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.3.resnets.0.norm2.bias, encoder.mid_block.attentions.0.group_norm.bias, encoder.down_blocks.2.resnets.0.norm2.weight, encoder.down_blocks.0.resnets.0.norm2.weight, encoder.down_blocks.3.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.0.conv2.weight, encoder.down_blocks.1.resnets.1.norm2.weight, encoder.down_blocks.0.resnets.1.norm1.weight, encoder.mid_block.attentions.0.group_norm.weight, encoder.down_blocks.0.resnets.0.norm1.weight, decoder.up_blocks.0.resnets.0.conv2.weight, encoder.down_blocks.0.resnets.1.norm1.bias, decoder.up_blocks.0.resnets.0.norm1.weight, encoder.down_blocks.3.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.0.norm1.bias, decoder.up_blocks.0.upsamplers.0.conv.bias, encoder.down_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.2.conv2.weight, decoder.mid_block.attentions.0.group_norm.weight, encoder.down_blocks.1.resnets.0.conv2.bias, decoder.up_blocks.0.resnets.1.conv2.weight, encoder.down_blocks.3.resnets.1.conv1.bias, encoder.down_blocks.3.resnets.0.conv1.weight, decoder.mid_block.resnets.1.norm2.bias, encoder.down_blocks.2.resnets.0.norm2.bias, encoder.down_blocks.2.resnets.1.conv2.weight, encoder.mid_block.attentions.0.to_k.bias, decoder.mid_block.resnets.1.conv1.weight, decoder.up_blocks.2.resnets.2.conv2.weight, encoder.down_blocks.3.resnets.1.conv2.weight, encoder.down_blocks.1.resnets.0.norm2.bias, encoder.down_blocks.1.resnets.1.conv2.bias, encoder.down_blocks.1.downsamplers.0.conv.weight, decoder.up_blocks.1.resnets.1.conv2.bias, encoder.down_blocks.1.resnets.0.conv1.weight, encoder.down_blocks.1.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.conv_shortcut.bias, decoder.up_blocks.0.resnets.2.norm1.weight, decoder.up_blocks.3.resnets.0.norm1.weight, encoder.mid_block.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.2.conv2.bias, decoder.mid_block.resnets.1.norm2.weight, encoder.mid_block.resnets.0.norm1.bias, decoder.up_blocks.3.resnets.1.norm2.weight, encoder.down_blocks.0.resnets.1.conv1.weight, decoder.up_blocks.2.resnets.0.norm2.bias, encoder.down_blocks.0.resnets.1.norm2.weight, encoder.down_blocks.2.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.0.resnets.1.norm1.weight, encoder.down_blocks.2.resnets.0.conv2.bias, decoder.up_blocks.3.resnets.1.conv2.bias, decoder.up_blocks.1.resnets.1.conv1.weight, decoder.mid_block.resnets.0.norm1.weight, decoder.up_blocks.2.resnets.0.conv2.bias, decoder.up_blocks.3.resnets.2.conv1.bias, decoder.mid_block.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.2.norm1.bias, decoder.up_blocks.3.resnets.0.conv1.weight, decoder.up_blocks.2.resnets.1.norm2.bias, decoder.up_blocks.1.resnets.2.norm1.bias, encoder.down_blocks.0.resnets.0.norm1.bias, decoder.up_blocks.1.resnets.2.norm2.weight, encoder.down_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.2.resnets.0.conv1.weight, decoder.up_blocks.2.resnets.1.conv1.bias, encoder.down_blocks.2.downsamplers.0.conv.weight, encoder.down_blocks.0.downsamplers.0.conv.bias, encoder.down_blocks.2.resnets.1.norm2.weight, decoder.up_blocks.3.resnets.1.conv1.bias, encoder.down_blocks.0.resnets.0.conv1.weight, encoder.down_blocks.2.resnets.0.conv_shortcut.bias, encoder.down_blocks.3.resnets.0.norm2.bias, encoder.down_blocks.3.resnets.0.norm2.weight.
Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
@contrebande-labs I'm not sure what exactly is going on with that error. Which Pony model is that?
You can review some recent testing of SDXL, CLIP Skip and the vanilla Pony XL model in this discord thread: https://discord.com/channels/1020123559063990373/1239802872611475528
See, in particular, my first series of images where I test pony xl on CLIP Skip 0 to 11 in Invoke. I did several such tests. Following that is more very relevant discussion and experimentation.
Note that Invoke uses diffusers internally, while comfy has its own implementation of SD pipelines. There are some situations (like CLIP Skip) where a setting for comfy doesn't translate exactly to diffusers due to internal differences.
Also, you can change CLIP Skip. Generate on the generation tab w/ SDXL then load the workflow. Add a CLIP Skip node between the model loader and compel nodes and have a play with it.
What we do not plan on doing is putting CLIP Skip in the Linear UI tabs, because it doesn't do anything useful. We have to provide it for SD1.5 because some models require it. We have zero evidence that pony XL requires it, and a good amount of evidence that you shouldn't use CLIP Skip with it.
It is the V6 version straight from the CivitAI website (with the VAE also downloadable on the same page). The error happens at the VAE encoding the final latent image at the end. So I'm able to see the preview latents. And even though the clip skip setting makes a big difference in ComfyUI, it did not seem to be the case with Invoke. I'm going to read the thread tomorrow and try to debug the error. I think it might have to do with the way I imported the VAE. And thanks for the help. I appreciate it. I'd much rather use Invoke than Comfy. I'll report back tomorrow.
Ok, gotcha. The VAE isn't working for me, it's not related to how you imported it. I created #6483 for the VAE issue.
Hi @psychedelicious ! I'll wait until #6483 is resolved to resume my testing. I will also have ingested all there is to know from the discord thread. Thanks!
Summary
CLIP skip allows the user to choose what the last layer of the CLIP model used during generation
InvokeAI supports use of CLIP skip with SD1.5 & SD2.1
Intended Outcome