kohya-ss / sd-scripts

Apache License 2.0
4.93k stars 825 forks source link

Subject: Proposal for Implementing a 1-bit Optimizer #1238

Open katyukuki opened 5 months ago

katyukuki commented 5 months ago

Subject: Proposal for Implementing a 1-bit Optimizer

Dear [sd-scripts] Team,

I hope this message finds you well. I am reaching out to propose the addition of a 1-bit optimizer to your project, similar to the implementation discussed on this webpage: https://qiita.com/carrotflakes/items/778f33faca40f32b1aaf. I believe incorporating such a feature could offer significant benefits in terms of reducing both training time and VRAM usage.

During my experiments, I did not utilize BitNet, which the article mentions, to assess the feasibility of the concept independently. My test involved training with a dimension of 16 and an alpha value of 2 (dim16.alpha2). I am pleased to report that the training proceeded to completion without encountering the issue of NaN losses. However, it appears the learning was almost ineffective, as the resulting model showed minimal improvement.

The motivation behind my proposal is to explore whether we can enhance learning efficiency or achieve comparable results with significantly lower computational costs by employing a 1-bit optimization technique. My preliminary experiments suggest that while it is possible to complete the training process, optimizing the approach to achieve effective learning outcomes is necessary.

I am keen to discuss this idea further with your team and explore potential collaborations to refine and implement this concept within your project framework. I believe that with some adjustments and further experimentation, we could unlock substantial benefits for the project, especially in terms of efficiency and resource utilization.

Thank you for considering my proposal. I am looking forward to your feedback and the possibility of contributing to your project.

Best regards, [kacchan]

katyukuki commented 5 months ago

1bit optimize sd1.5 only compatible version file for prototype sd-scripts. It needs to be improved for higher speed. additional tools for optimizer TernaryOpt sd1.5 Copy the two included folders into the sd-scripts folder Overwrite them and the update is complete. To use it, type --optimizer_type="TernaryOpt" ^ to make it work. The content is Prodigy with 1-bit conversion added. --optimizer_args "decouple=True" "weight_decay=0.01" "d0=1e-8" "d_coef=0.99" "use_bias_correction=True" "safeguard_warmup=True"^ Created with reference to TernaryOpt by @carrotflakes https://qiita.com/carrotflakes/items/778f33faca40f32b1aaf

file pass:yami  https://29.gigafile.nu/0409-b3a8a8b82a88a315a30a17f748686521a

kohya-ss commented 5 months ago

Thank you for this! This is very interesting. I think the only advantage is the file size of the saved parameters currently. However, there may be other improvements by further investigation.

katyukuki commented 5 months ago

Thank you for your reply. I will try again when I come up with some more solution ideas. I am in the process of experimenting with the bitnet system to see if I can change anything by reusing the bitnet system processing as an optimizer.

katyukuki commented 5 months ago

I have a proposal aimed at optimizing the learning process for image data, particularly focusing on scenarios where memory and computational resources are constrained. My understanding of the current optimization techniques in image learning is not exhaustive, but I propose a method that efficiently teaches the model about specific objects, such as glasses, across both individual and multiple images.The core of this idea involves treating images as two-dimensional spaces and preparing the model to navigate these spaces for pathfinding. By quantizing the model's weights to 1-bit, representing them in ternary form (-1, 0, 1), we can significantly reduce the required storage and computational power. Concentrating the learning process on the binary aspects (1 and 0) allows the model to effectively grasp and learn features associated with glasses.Moreover, this proposal encompasses an approach that is not limited to a single image but can be efficiently applied to multiple images. By effectively skipping images that contain only a single value (-1, for instance), and shifting the learning focus to other, more informative images or illustrations, we can optimize and accelerate the entire process.This proposal aims to offer a new approach to enhancing the efficiency and effectiveness of image learning, especially in environments where resources are limited. Adopting this method could potentially keep computational costs low while maintaining or even improving the quality of learning outcomes.