hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
561 stars 80 forks source link

Lora Trainer cannot be used, not supported between instances of 'TorchVersion' and 'Version' #95

Closed wzgrx closed 3 months ago

wzgrx commented 3 months ago

request repair,The new 221 is 10 times slower than the previous installation and cannot be used. Can we create a stable version? ⭐ Starting trainer...

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /content/kohya-trainer/train_network.py:15 in │ │ │ │ 12 from tqdm import tqdm │ │ 13 import torch │ │ 14 from accelerate.utils import set_seed │ │ ❱ 15 from diffusers import DDPMScheduler │ │ 16 │ │ 17 import library.train_util as train_util │ │ 18 from library.train_util import ( │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/init.py:3 in │ │ │ │ 1 version = "0.10.2" │ │ 2 │ │ ❱ 3 from .configuration_utils import ConfigMixin │ │ 4 from .onnx_utils import OnnxRuntimeModel │ │ 5 from .utils import ( │ │ 6 │ OptionalDependencyNotAvailable, │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py:34 in │ │ │ │ 31 from requests import HTTPError │ │ 32 │ │ 33 from . import version │ │ ❱ 34 from .utils import DIFFUSERS_CACHE, HUGGINGFACE_CO_RESOLVE_ENDPOINT, DummyObject, deprec │ │ 35 │ │ 36 │ │ 37 logger = logging.get_logger(name) │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/utils/init.py:22 in │ │ │ │ 19 │ │ 20 from .. import version │ │ 21 from .deprecation_utils import deprecate │ │ ❱ 22 from .import_utils import ( │ │ 23 │ ENV_VARS_TRUE_AND_AUTO_VALUES, │ │ 24 │ ENV_VARS_TRUE_VALUES, │ │ 25 │ USE_JAX, │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py:207 in │ │ │ │ 204 │ if _torch_available: │ │ 205 │ │ import torch │ │ 206 │ │ │ │ ❱ 207 │ │ if torch.version < version.Version("1.12"): │ │ 208 │ │ │ raise ValueError("PyTorch should be >= 1.12") │ │ 209 │ logger.debug(f"Successfully imported xformers version {_xformers_version}") │ │ 210 except importlib_metadata.PackageNotFoundError: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: '<' not supported between instances of 'TorchVersion' and 'Version' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/w/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/w/training_config.toml']' returned non-zero exit status 1.

Ope325 commented 3 months ago

same

hollowstrawberry commented 3 months ago

Sadly there can never be a stable version because Google keeps updating the colab image and breaking the dependencies. I can only fix it when it happens.

Could you elaborate about 221 being "10 times slower"? Are you referring to the XL trainer?

wzgrx commented 3 months ago

Sadly there can never be a stable version because Google keeps updating the colab image and breaking the dependencies. I can only fix it when it happens.

Could you elaborate about 221 being "10 times slower"? Are you referring to the XL trainer?

When installing the runtime environment, it is much slower than the previous version. How long does this fix take? I can help you test it anytime.

wzgrx commented 3 months ago

Sadly there can never be a stable version because Google keeps updating the colab image and breaking the dependencies. I can only fix it when it happens.

Could you elaborate about 221 being "10 times slower"? Are you referring to the XL trainer?

lora trainer

3djedi commented 3 months ago

Confirmed here... same issue

"> request repair,The new 221 is 10 times slower than the previous installation and cannot be used. Can we create a stable version? ⭐ Starting trainer......"

✅ Installation finished in 347 seconds. "on CoLab Pro"

⭐ Starting trainer...

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /content/kohya-trainer/train_network.py:15 in │ │ │ │ 12 from tqdm import tqdm │ │ 13 import torch │ │ 14 from accelerate.utils import set_seed │ │ ❱ 15 from diffusers import DDPMScheduler │ │ 16 │ │ 17 import library.train_util as train_util │ │ 18 from library.train_util import ( │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/init.py:3 in │ │ │ │ 1 version = "0.10.2" │ │ 2 │ │ ❱ 3 from .configuration_utils import ConfigMixin │ │ 4 from .onnx_utils import OnnxRuntimeModel │ │ 5 from .utils import ( │ │ 6 │ OptionalDependencyNotAvailable, │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py:34 in │ │ │ │ 31 from requests import HTTPError │ │ 32 │ │ 33 from . import version │ │ ❱ 34 from .utils import DIFFUSERS_CACHE, HUGGINGFACE_CO_RESOLVE_ENDPOINT, DummyObject, deprec │ │ 35 │ │ 36 │ │ 37 logger = logging.get_logger(name) │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/utils/init.py:22 in │ │ │ │ 19 │ │ 20 from .. import version │ │ 21 from .deprecation_utils import deprecate │ │ ❱ 22 from .import_utils import ( │ │ 23 │ ENV_VARS_TRUE_AND_AUTO_VALUES, │ │ 24 │ ENV_VARS_TRUE_VALUES, │ │ 25 │ USE_JAX, │ │ │ │ /usr/local/lib/python3.10/dist-packages/diffusers/utils/import_utils.py:207 in │ │ │ │ 204 │ if _torch_available: │ │ 205 │ │ import torch │ │ 206 │ │ │ │ ❱ 207 │ │ if torch.version < version.Version("1.12"): │ │ 208 │ │ │ raise ValueError("PyTorch should be >= 1.12") │ │ 209 │ logger.debug(f"Successfully imported xformers version {_xformers_version}") │ │ 210 except importlib_metadata.PackageNotFoundError: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: '<' not supported between instances of 'TorchVersion' and 'Version' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--dataset_config=/content/drive/MyDrive/Loras/aa/dataset_config.toml', '--config_file=/content/drive/MyDrive/Loras/aa/training_config.toml']' returned non-zero exit status 1.

3djedi commented 3 months ago

New "fixed" file not available?

202403150833

wzgrx commented 3 months ago

New "fixed" file not available?

202403150833

not

baicai99 commented 3 months ago

same

hollowstrawberry commented 3 months ago

Should be fixed now, and using a dev version of xformers