torch.OutOfMemoryError: Allocation on device

funwithforks commented 2 weeks ago

This extension is much more efficient and simple to use than kohya. I like it a lot!

However I am having a frequent issue where it will fill up the memory right before training. After the following error message, if I queue again, it will complete training. This behavior also breaks my queue.

FLUX: Gradient checkpointing enabled. CPU offload: False prepare optimizer, data loader etc. INFO use CAME optimizer | {} train_util.py:4553 2024-08-27 10:15:06 ERROR !!! Exception during processing !!! Allocation on device execution.py:386 ERROR Traceback (most recent call last): execution.py:387 File "/run/media/computer/dell1/ComfyUI/execution.py", line 317, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all,
execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/computer/dell1/ComfyUI/execution.py", line 192, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION,
allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/computer/dell1/ComfyUI/execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "/run/media/computer/dell1/ComfyUI/execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/computer/dell1/ComfyUI/custom_nodes/ComfyUI-FluxTrainer/nodes.py", line
355, in init_training
training_loop = network_trainer.init_train(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/custom_nodes/ComfyUI-FluxTrainer/train_network.py", line
575, in init_train
unet = accelerator.prepare(unet)
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerato
r.py", line 1311, in prepare
result = tuple(
^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerato
r.py", line 1312, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args,
device_placement)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerato
r.py", line 1188, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerato
r.py", line 1435, in prepare_model
model = model.to(self.device)
^^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 780, in _apply
module._apply(fn)
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 780, in _apply
module._apply(fn)
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 780, in _apply
module._apply(fn)
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 805, in _apply
param_applied = fn(param)
^^^^^^^^^
File
"/run/media/computer/dell1/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/modu
le.py", line 1160, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: Allocation on device

                ERROR    Got an OOM, unloading all loaded models.                                                    execution.py:397
                INFO     Prompt executed in 32.85 seconds

DarkAlchy commented 1 week ago

Yeah, my 4090 has the same issue, so back to the horrible gui for me.

kijai commented 1 week ago

Make sure you are on torch 2.4.0 or later, previous versions use a lot more memory for some reason with kohya. With multires training it really should never go past ~18GB, and with 512p training it should stay bit below 16GB.

muxin159 commented 1 week ago

Make sure you are on torch 2.4.0 or later, previous versions use a lot more memory for some reason with kohya. With multires training it really should never go past ~18GB, and with 512p training it should stay bit below 16GB.确保您使用的是 torch 2.4.0 或更高版本，由于某些原因，以前的版本使用 kohya 会使用更多内存。通过多分辨率训练，它实际上永远不应该超过约 18GB，而通过 512p 训练，它应该保持在 16GB 以下。

My torch version is 2.4.0+cu121, but I also encountered this issue. My graphics card has 22GB of VRAM

kijai / ComfyUI-FluxTrainer

torch.OutOfMemoryError: Allocation on device #20