Closed Gyramuur closed 18 hours ago
It does look like the model is wrong, ofs_embeds only exists in the 1.5 I2V.
I have converted the 1.5 models to a single file versions which you may have better luck with (they also load faster), you can load them with this node:
And the VAE with this node:
The models are loaded from the normal ComfyUI diffusion_models and vae -folders instead of the custom CogVideo folder.
It does look like the model is wrong, ofs_embeds only exists in the 1.5 I2V.
I have converted the 1.5 models to a single file versions which you may have better luck with (they also load faster), you can load them with this node:
And the VAE with this node:
The models are loaded from the normal ComfyUI diffusion_models and vae -folders instead of the custom CogVideo folder.
Thanks :D I'll give it a shot. Just so I don't make any potential mistakes, where do I place these models?
It does look like the model is wrong, ofs_embeds only exists in the 1.5 I2V. I have converted the 1.5 models to a single file versions which you may have better luck with (they also load faster), you can load them with this node: And the VAE with this node: The models are loaded from the normal ComfyUI diffusion_models and vae -folders instead of the custom CogVideo folder.
Thanks :D I'll give it a shot. Just so I don't make any potential mistakes, where do I place these models?
ComfyUI/models/diffusion_models
and ComfyUI/models/vae
You'll know if they are in right place if they show up in the model loader.
It does look like the model is wrong, ofs_embeds only exists in the 1.5 I2V. I have converted the 1.5 models to a single file versions which you may have better luck with (they also load faster), you can load them with this node: And the VAE with this node: The models are loaded from the normal ComfyUI diffusion_models and vae -folders instead of the custom CogVideo folder.
Thanks :D I'll give it a shot. Just so I don't make any potential mistakes, where do I place these models?
ComfyUI/models/diffusion_models
andComfyUI/models/vae
You'll know if they are in right place if they show up in the model loader.
Yeah it's working now, thanks a bunch ^^
This has been plaguing me for the past few days. I was hoping that 1.5 being pushed to the main branch would resolve it, but unfortunately it is still broken and I am completely stuck.
My process has been this:
This is the error I get hit with:
!!! Exception during processing !!! CogVideoXTransformer3DModel( (patch_embed): CogVideoXPatchEmbed( (proj): Conv2d(32, 3072, kernel_size=(2, 2), stride=(2, 2)) (text_proj): Linear(in_features=4096, out_features=3072, bias=True) ) (embedding_dropout): Dropout(p=0.0, inplace=False) (time_proj): Timesteps() (time_embedding): TimestepEmbedding( (linear_1): Linear(in_features=3072, out_features=512, bias=True) (act): SiLU() (linear_2): Linear(in_features=512, out_features=512, bias=True) ) (transformer_blocks): ModuleList( (0-41): 42 x CogVideoXBlock( (norm1): CogVideoXLayerNormZero( (silu): SiLU() (linear): Linear(in_features=512, out_features=18432, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (attn1): Attention( (norm_q): LayerNorm((64,), eps=1e-06, elementwise_affine=True) (norm_k): LayerNorm((64,), eps=1e-06, elementwise_affine=True) (to_q): Linear(in_features=3072, out_features=3072, bias=True) (to_k): Linear(in_features=3072, out_features=3072, bias=True) (to_v): Linear(in_features=3072, out_features=3072, bias=True) (to_out): ModuleList( (0): Linear(in_features=3072, out_features=3072, bias=True) (1): Dropout(p=0.0, inplace=False) ) ) (norm2): CogVideoXLayerNormZero( (silu): SiLU() (linear): Linear(in_features=512, out_features=18432, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (ff): FeedForward( (net): ModuleList( (0): GELU( (proj): Linear(in_features=3072, out_features=12288, bias=True) ) (1): Dropout(p=0.0, inplace=False) (2): Linear(in_features=12288, out_features=3072, bias=True) (3): Dropout(p=0.0, inplace=False) ) ) ) ) (norm_final): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) (norm_out): AdaLayerNorm( (silu): SiLU() (linear): Linear(in_features=512, out_features=6144, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (proj_out): Linear(in_features=3072, out_features=64, bias=True) ) has no attribute ofs_embedding. Traceback (most recent call last): File "Z:\webui\ComfyUI_windows_portable\ComfyUI\execution.py", line 323, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\webui\ComfyUI_windows_portable\ComfyUI\execution.py", line 198, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\webui\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "Z:\webui\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(*inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\webui\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\model_loading.py", line 215, in loadmodel transformer = CogVideoXTransformer3DModel.from_pretrained(base_path, subfolder=subfolder) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\webui\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn return fn(args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "Z:\webui\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\modeling_utils.py", line 886, in from_pretrained accelerate.load_checkpoint_and_dispatch( File "Z:\webui\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\big_modeling.py", line 613, in load_checkpoint_and_dispatch load_checkpoint_in_model( File "Z:\webui\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\utils\modeling.py", line 1821, in load_checkpoint_in_model set_module_tensor_to_device( File "Z:\webui\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\utils\modeling.py", line 336, in set_module_tensor_to_device raise ValueError(f"{module} has no attribute {split}.") ValueError: CogVideoXTransformer3DModel( (patch_embed): CogVideoXPatchEmbed( (proj): Conv2d(32, 3072, kernel_size=(2, 2), stride=(2, 2)) (text_proj): Linear(in_features=4096, out_features=3072, bias=True) ) (embedding_dropout): Dropout(p=0.0, inplace=False) (time_proj): Timesteps() (time_embedding): TimestepEmbedding( (linear_1): Linear(in_features=3072, out_features=512, bias=True) (act): SiLU() (linear_2): Linear(in_features=512, out_features=512, bias=True) ) (transformer_blocks): ModuleList( (0-41): 42 x CogVideoXBlock( (norm1): CogVideoXLayerNormZero( (silu): SiLU() (linear): Linear(in_features=512, out_features=18432, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (attn1): Attention( (norm_q): LayerNorm((64,), eps=1e-06, elementwise_affine=True) (norm_k): LayerNorm((64,), eps=1e-06, elementwise_affine=True) (to_q): Linear(in_features=3072, out_features=3072, bias=True) (to_k): Linear(in_features=3072, out_features=3072, bias=True) (to_v): Linear(in_features=3072, out_features=3072, bias=True) (to_out): ModuleList( (0): Linear(in_features=3072, out_features=3072, bias=True) (1): Dropout(p=0.0, inplace=False) ) ) (norm2): CogVideoXLayerNormZero( (silu): SiLU() (linear): Linear(in_features=512, out_features=18432, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (ff): FeedForward( (net): ModuleList( (0): GELU( (proj): Linear(in_features=3072, out_features=12288, bias=True) ) (1): Dropout(p=0.0, inplace=False) (2): Linear(in_features=12288, out_features=3072, bias=True) (3): Dropout(p=0.0, inplace=False) ) ) ) ) (norm_final): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) (norm_out): AdaLayerNorm( (silu): SiLU() (linear): Linear(in_features=512, out_features=6144, bias=True) (norm): LayerNorm((3072,), eps=1e-05, elementwise_affine=True) ) (proj_out): Linear(in_features=3072, out_features=64, bias=True) ) has no attribute ofs_embedding.
Now, I've manually downloaded the I2V model from HuggingFace here, so it's entirely possible I might have renamed or misplaced something. Here's the directory of ComfyUI\models\CogVideo\CogVideoX-5b-I2V:
Z:. │ .gitattributes │ configuration.json │ LICENSE │ model_index.json │ README.md │ README_zh.md │ ├───scheduler │ scheduler_config.json │ ├───text_encoder │ config.json │ model-00001-of-00004.safetensors │ model-00002-of-00004.safetensors │ model-00003-of-00004.safetensors │ model-00004-of-00004.safetensors │ model.safetensors.index.json │ ├───tokenizer │ added_tokens.json │ special_tokens_map.json │ spiece.model │ tokenizer_config.json │ ├───transformer │ config.json │ diffusion_pytorch_model-00001-of-00003.safetensors │ diffusion_pytorch_model-00002-of-00003.safetensors │ diffusion_pytorch_model-00003-of-00003.safetensors │ diffusion_pytorch_model.safetensors.index.json │ └───vae config.json diffusion_pytorch_model.safetensors