Closed peteallen closed 1 week ago
Please see this section of README: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-fine-tuning
The readme doesn't really explain what the "swap" part of block swapping actually is, or why it would be helpful or not. Does it swap the weights between a given pair of blocks before training, does it unload and reload a random set of blocks over the course of training to reduce memory usage, or does it do something else entirely?
Looking through the code, it appears to be swapping blocks between the GPU and CPU to conserve VRAM. My testing has been a little inconsistent but in general, higher numbers seem to lead to lower VRAM usage.
I’m not sure the difference between double and single blocks, though.
On Sun, Sep 1, 2024, at 4:36 PM, setothegreat wrote:
The readme doesn't really explain what the "swap" part of block swapping actually is, or why it would be helpful or not. Does it swap the weights between a given pair of blocks before training, does it unload and reload a random set of blocks over the course of training to reduce memory usage, or does it do something else entirely?
— Reply to this email directly, view it on GitHub https://github.com/kohya-ss/sd-scripts/issues/1547#issuecomment-2323521987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4EDM6LU2AJXSA6VOFID7TZUOJGHAVCNFSM6AAAAABNOFF7YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGUZDCOJYG4. You are receiving this because you authored the thread.Message ID: @.***>
I've updated the README, hope this helps: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#key-features-for-flux1-fine-tuning
Is there anywhere I can read about the concept behind the --double_blocks_to_swap and --single_blocks_to_swap options? I've searched here and on Google and haven't found much info, and I don't understand what these options do.