kijai / ComfyUI-CogVideoXWrapper

980 stars 59 forks source link

Error using Tora example workflow: RuntimeError: Input type (struct c10::BFloat16) and bias type (struct c10::Half) should be the same #267

Closed EnragedAntelope closed 1 day ago

EnragedAntelope commented 1 day ago

Hi, Thanks for all of your work on this! I am fully updated as of 30 minutes ago, and other workflows I've tried are working well. However when trying the Tora example workflow in your repository, I get the below error. If it matters, I am on Python 3.11.9, Torch 2.51, and CUDA 12.4. Thank you!

Downloading model to: D:\ComfyUI\models\CogVideo\CogVideoX-Fun-V1.1-5b-InP
Fetching 5 files: 100%|███████████████████████████████████████████████████████████████████████████████| 5/5 [04:25<00:00, 53.12s/it]
The config attributes {'add_noise_in_inpaint_model': True} were passed to CogVideoXTransformer3DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
end_vram - start_vram: 13008260046 - 1867276110 = 11140983936
#80 [DownloadAndLoadCogVideoModel]: 274.52s - vram 11140983936b
Downloading Fuser model to: D:\ComfyUI\models\CogVideo\CogVideoX-5b-Tora\fuser\fuser.safetensors
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.42s/it]
Downloading trajectory extractor model to: D:\ComfyUI\models\CogVideo\CogVideoX-5b-Tora\traj_extractor\traj_extractor.safetensors
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.51s/it]
end_vram - start_vram: 14446958702 - 13008260046 = 1438698656
#75 [DownloadAndLoadToraModel]: 35.23s - vram 1438698656b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#72 [LoadImage]: 0.02s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#73 [ImageResizeKJ]: 0.00s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#60 [SplineEditor]: 0.11s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#67 [GetMaskSizeAndCount]: 0.00s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#85 [SplineEditor]: 0.07s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#82 [SplineEditor]: 0.07s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#83 [AppendStringsToList]: 0.00s - vram 0b
end_vram - start_vram: 14446958702 - 14446958702 = 0
#86 [AppendStringsToList]: 0.00s - vram 0b
received 3 trajectorie(s)
video_flow shape after encoding: torch.Size([1, 16, 13, 60, 90])
end_vram - start_vram: 16117826884 - 14446958702 = 1670868182
#78 [ToraEncodeTrajectory]: 4.74s - vram 1670868182b
end_vram - start_vram: 14724537294 - 14724537294 = 0
#66 [VHS_VideoCombine]: 0.32s - vram 0b
end_vram - start_vram: 14724537294 - 14724537294 = 0
#65 [CreateShapeImageOnPath]: 0.21s - vram 0b
Encoded latents shape: torch.Size([1, 13, 16, 60, 90])
end_vram - start_vram: 16395405476 - 14724537294 = 1670868182
#93 [CogVideoImageEncodeFunInP]: 3.57s - vram 1670868182b
end_vram - start_vram: 14726924094 - 14726924094 = 0
#20 [CLIPLoader]: 2.17s - vram 0b
Requested to load SD3ClipModel_
Loading 1 new model
loaded partially 64.0 32.38671875 0
end_vram - start_vram: 24471274066 - 14726924094 = 9744349972
#30 [CogVideoTextEncode]: 12.29s - vram 9744349972b
Unloading models for lowram load.
1 models unloaded.
Loading 1 new model
loaded partially 6805.310731887817 6779.38671875 0
end_vram - start_vram: 24251545410 - 24251545410 = 0
#31 [CogVideoTextEncode]: 5.73s - vram 0b
Received 13 image conditioning frames
Context schedule disabled
Tora trajectory length: 13
Sampling 49 frames in 13 latent frames at 720x480 with 40 inference steps
  0%|                                                                                                        | 0/40 [00:00<?, ?it/s]
!!! Exception during processing !!! Input type (struct c10::BFloat16) and bias type (struct c10::Half) should be the same
Traceback (most recent call last):
  File "D:\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "D:\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 696, in process
    latents = model["pipe"](
              ^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 757, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\custom_cogvideox_transformer_3d.py", line 688, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\custom_cogvideox_transformer_3d.py", line 266, in forward
    h = fuser(h, video_flow_feature.to(h), T=T)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\tora\traj_module.py", line 287, in forward
    gamma_flow = self.flow_gamma_spatial(flow)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
    return F.conv2d(
           ^^^^^^^^^
RuntimeError: Input type (struct c10::BFloat16) and bias type (struct c10::Half) should be the same

end_vram - start_vram: 16169275134 - 14726924094 = 1442351040
#79 [CogVideoSampler]: 0.66s - vram 1442351040b
kijai commented 1 day ago

Thanks for the report, indeed I broke it with last update, should be fixed now.

EnragedAntelope commented 1 day ago

Confirmed fixed, thank you!