kohya-ss / sd-scripts

Apache License 2.0
4.93k stars 825 forks source link

Block swapping documentation? #1547

Closed peteallen closed 1 week ago

peteallen commented 2 weeks ago

Is there anywhere I can read about the concept behind the --double_blocks_to_swap and --single_blocks_to_swap options? I've searched here and on Google and haven't found much info, and I don't understand what these options do.

kohya-ss commented 2 weeks ago

Please see this section of README: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-fine-tuning

setothegreat commented 2 weeks ago

The readme doesn't really explain what the "swap" part of block swapping actually is, or why it would be helpful or not. Does it swap the weights between a given pair of blocks before training, does it unload and reload a random set of blocks over the course of training to reduce memory usage, or does it do something else entirely?

peteallen commented 1 week ago

Looking through the code, it appears to be swapping blocks between the GPU and CPU to conserve VRAM. My testing has been a little inconsistent but in general, higher numbers seem to lead to lower VRAM usage.

I’m not sure the difference between double and single blocks, though.

On Sun, Sep 1, 2024, at 4:36 PM, setothegreat wrote:

The readme doesn't really explain what the "swap" part of block swapping actually is, or why it would be helpful or not. Does it swap the weights between a given pair of blocks before training, does it unload and reload a random set of blocks over the course of training to reduce memory usage, or does it do something else entirely?

— Reply to this email directly, view it on GitHub https://github.com/kohya-ss/sd-scripts/issues/1547#issuecomment-2323521987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4EDM6LU2AJXSA6VOFID7TZUOJGHAVCNFSM6AAAAABNOFF7YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGUZDCOJYG4. You are receiving this because you authored the thread.Message ID: @.***>

kohya-ss commented 1 week ago

I've updated the README, hope this helps: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#key-features-for-flux1-fine-tuning