kijai / ComfyUI-CogVideoXWrapper

335 stars 21 forks source link

ERROR: Allocation issue #26

Closed Duemellon closed 1 week ago

Duemellon commented 2 weeks ago

Probably not much to be done about this but getting more memory but posting this in case someone has an idea beyond that.

Error occurred when executing CogVideoDecode:

Allocation on device

File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) File "E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 336, in decode frames = vae.decode(latents).sample File "D:\Python\Python310\lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper return method(self, *args, *kwargs) File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1153, in decode decoded = self._decode(z).sample File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1123, in _decode z_intermediate = self.decoder(z_intermediate) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 877, in forward hidden_states = up_block(hidden_states, temb, sample) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 602, in forward hidden_states = resnet(hidden_states, temb, zq) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 286, in forward hidden_states = self.norm1(hidden_states, zq) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "D:\Python\Python310\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 187, in forward new_f = norm_f self.conv_y(zq) + self.conv_b(zq)

al3dv2 commented 2 weeks ago

I have the same error and when I put "enable vae tiling" to true the error disappear

kijai commented 2 weeks ago

The decoding is the most memory intensive part, using the tiled option in the decode node reduces the memory use considerably, but also can introduce some seams in the result.

Duemellon commented 2 weeks ago

give me the bare min settings to use to take up the least memory. Even the smallest dimension of images would help. I have a 12gb 3060 rtx

kijai commented 2 weeks ago

give me the bare min settings to use to take up the least memory. Even the smallest dimension of images would help. I have a 12gb 3060 rtx

It could fit with the fp8 transformer enabled, and VAE tiling enabled on the VAE decode node. Absolute last resort is the newly added sequential_cpu_offload, it will be slow though. Also doing less frames uses less memory of course.