Huge VRAM usage with start/stop

slashedstar commented 8 months ago

Is this expected when using the start/stop? I was getting OOM errors and had to change a setting in the nvidia CP to allow the fallback to RAM, which means when I use this the it/s drop by a lot, I go from 6 it/s to 1.5it/s after the lora is stopped/started by the extension (I'm on Forge, b9705c58f66c6fd2c4a0168b26c5cf1fa6c0dde3)

hako-mikan commented 8 months ago

Does this issue also occur with the latest version of Forge? I tested it and did not encounter any problems.

slashedstar commented 7 months ago

Brand new installation, just git cloned, started and installed the extension, the OOM happens with SDXL but not with 1.5, though its still able to complete the image.

With start=10

Moving model(s) has taken 1.59 seconds
 55%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                                         | 11/20 [00:01<00:01,  6.03it/s]ERROR diffusion_model.output_blocks.0.1.transformer_blocks.2.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.06 GiB is allocated by PyTorch, and 218.22 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR diffusion_model.output_blocks.0.1.transformer_blocks.3.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.13 GiB is allocated by PyTorch, and 177.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
*** Error executing callback cfg_denoiser_callback for E:\blankforge\stable-diffusion-webui-forge\extensions\sd-webui-lora-block-weight\scripts\lora_block_weight.py
    Traceback (most recent call last):
      File "E:\blankforge\stable-diffusion-webui-forge\modules\script_callbacks.py", line 233, in cfg_denoiser_callback
        c.callback(params)
      File "E:\blankforge\stable-diffusion-webui-forge\extensions\sd-webui-lora-block-weight\scripts\lora_block_weight.py", line 455, in denoiser_callback
        shared.sd_model.forge_objects.unet.patch_model()
      File "E:\blankforge\stable-diffusion-webui-forge\ldm_patched\modules\model_patcher.py", line 216, in patch_model
        out_weight = self.calculate_weight(self.patches[key], temp_weight, key).to(weight.dtype)
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.13 GiB is allocated by PyTorch, and 177.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.48it/s]
To load target model AutoencoderKL██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  5.41it/s]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  1908.81689453125
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  725.2598133087158
Moving model(s) has taken 0.11 seconds
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.11it/s]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  5.41it/s]

(this was to generate a single 512x512)

PANyZHAL commented 7 months ago

same problem on version: f0.0.17v1.8.0rc-latest-276-g29be1da7

hako-mikan / sd-webui-lora-block-weight

Huge VRAM usage with start/stop #154