kijai / ComfyUI-LivePortraitKJ

ComfyUI nodes for LivePortrait
MIT License
1.53k stars 116 forks source link

Regression - Develop Branch goes out of memory on Macbook for large frame numbers #82

Open Grant-CP opened 2 months ago

Grant-CP commented 2 months ago

There is some part of the new code that allocates more memory on my device. I used to be able to animate a still image for 600 frames of video and now I cannot.

Not suggesting any change, especially if this is MPS specific, but I do wonder if too many tensors are kept in GPU memory and whether this might affect CUDA users also. It could also be that my try/except style code introduces a memory leak and that's not actually a good way to handle torch compatibility issues. I'll test that some time later. All memory is correctly freed when rerunning nodes or killing comfyui so I doubt there's a leak.

When using the initial version of the main branch I'm pretty sure that RAM usage didn't scale up much with more video frames. On the develop branch, Memory usage scales up linearly with time both during the cropper node and the process node.

I suspect it is in the "LivePotraitCropper" node that some very large allocation is taking place. My macbook has 32 GB RAM, so I have no idea why allocating 36GB is allowed. Swap was being used somewhat.

Processing source images...:  77%|████████████████████████████████████████████████████████▋                 | 460/600 [00:55<00:16,  8.35it/s]
!!! Exception during processing!!! MPS backend out of memory (MPS allocated: 36.23 GB, other allocations: 12.23 MB, max allowed: 36.27 GB). Tried to allocate 32.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
Traceback (most recent call last):
  File "/Users/grant/Documents/Repos/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/custom_nodes/ComfyUI-LivePortraitKJ/nodes.py", line 462, in process
    f_s = pipeline.live_portrait_wrapper.extract_feature_3d(I_s)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/custom_nodes/ComfyUI-LivePortraitKJ/liveportrait/live_portrait_wrapper.py", line 61, in extract_feature_3d
    feature_3d = self.appearance_feature_extractor(x)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/custom_nodes/ComfyUI-LivePortraitKJ/liveportrait/modules/appearance_feature_extractor.py", line 42, in forward
    out = self.down_blocks[i](out)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grant/Documents/Repos/ComfyUI/custom_nodes/ComfyUI-LivePortraitKJ/liveportrait/modules/util.py", line 136, in forward
    out = F.relu(out)
          ^^^^^^^^^^^
  File "/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/torch/nn/functional.py", line 1500, in relu
    result = torch.relu(input)
             ^^^^^^^^^^^^^^^^^
RuntimeError: MPS backend out of memory (MPS allocated: 36.23 GB, other allocations: 12.23 MB, max allowed: 36.27 GB). Tried to allocate 32.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Prompt executed in 656.02 seconds
kijai commented 2 months ago

I noticed this as well last night, lots of stuff was kept on VRAM for no reason, fix should be in and it now uses RAM instead and at least I can do around 2k frames before OOM with 64GB RAM, VRAM use being under 10GB all through it.

Grant-CP commented 2 months ago

@kijai

Main branch seems to be working well for me.

With driving input being just a single image (using insightface), I can do 600 frames no problem. 1.1s/it like before.

I think the issue was (and still is a little) that having many frames go into the cropper takes a lot of memory.

On current main branch, doing vid2vid with mediapipe pipeline for cropping (on accident), 354 frames, I'm still using some swap, but it's not going crazy. Again on my mac I have unified memory so I'm not sure whether any ram/vram swapping matters for me. I do notice that there is large RAM usage even after the workflow is done, but it gets flushed just fine if I start a different workflow. So I assume most memory usage is just to actually save the output of the cropping operation.

I also had to update my mediapipe and protobuf to get it to work. I got a warning. This probably isn't your problem but mediapipes? Not sure:

/opt/anaconda3/envs/comfyui/lib/python3.11/site-packages/google/protobuf/symbol_database.py:55:
UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.