Open daniellumertz opened 1 year ago
Got the same error. Turning off 8bit_adam worked too.
how do i turn 8bit_adam off?
Under training parameters and then advanced is the checkbox you are looking for.
@florinbarbisch YOU ARE MY HERO
Under training parameters and then advanced is the checkbox you are looking for.
I tried to look there, but the checkbox for 8bit isn't there for me in the GUI. What's weird is that I know where it's supposed to be as I have been watching tutorials, but it seems that I missing that option that others see. Could possibly be because of an update? Is there another way one could turn it off?
Same... I too couldn't find it
Traceback (most recent call last):
File "F:\StableDiffusion\kohya_ss\train_network.py", line 783, in
same issue here, changing AdamW8bit to AdamW same results
Having this error also, All the optimizers seem to fail with the error: ValueError: Using torch.compile requires PyTorch 2.0 or higher.
It seems like i have simmilar issue.
Ive setup freshly Stable Diffusion and Lora and I'm stuck with this error:
←[1;33m=============================================================
Modules installed outside the virtual environment were found.
This can cause issues. Please review the installed modules.
You can uninstall all local modules with:
←[1;34mdeactivate
pip freeze > uninstall.txt
pip uninstall -y -r uninstall.txt
←[1;33m=============================================================←[0m
11:26:23-202822 INFO nVidia toolkit detected
11:26:25-387345 INFO Torch 2.0.1+cu118
11:26:25-437329 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700
11:26:25-440027 INFO Torch detected GPU: NVIDIA GeForce RTX 4070 VRAM 12281 Arch (8, 9) Cores 46
11:26:25-443028 INFO Verifying requirements
11:26:28-508309 INFO headless: False
11:26:28-513320 INFO Load CSS...
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
11:26:38-685997 INFO Loading config...
11:28:06-006947 INFO Start training LoRA Standard ...
11:28:06-008938 INFO Folder 100_Device: 39 images found
11:28:06-009938 INFO Folder 100_Device: 3900 steps
11:28:06-011939 INFO Total steps: 3900
11:28:06-012941 INFO Train batch size: 1
11:28:06-014949 INFO Gradient accumulation steps: 1
11:28:06-015949 INFO Epoch: 1
11:28:06-016949 INFO Regulatization factor: 1
11:28:06-017940 INFO max_train_steps (3900 / 1 / 1 * 1 * 1) = 3900
11:28:06-020940 INFO stop_text_encoder_training = 0
11:28:06-021941 INFO lr_warmup_steps = 0
11:28:06-023951 INFO accelerate launch --num_cpu_threads_per_process=1 "train_network.py" --enable_bucket
--pretrained_model_name_or_path="D:/PROJEKTY/SD/stable-diffusion-webui/models/Stable-diffusion/
realisticVisionV20_v13.safetensors"
--train_data_dir="D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/image"
--resolution=512,512 --output_dir="D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/model"
--logging_dir="D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/log" --network_alpha="128"
--save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05
--unet_lr=0.0001 --network_dim=128 --output_name="HardSurface_device"
--lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant"
--train_batch_size="1" --max_train_steps="3900" --save_every_n_epochs="1"
--mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents
--optimizer_type="Lion" --max_data_loader_n_workers="0" --clip_skip=2 --bucket_reso_steps=64
--xformers --bucket_no_upscale
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory D:\PROJEKTY\SD\Lora_training_data\Hard_surface\Device\image\100_Device contains 39 image files
3900 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0]
image_dir: "D:\PROJEKTY\SD\Lora_training_data\Hard_surface\Device\image\100_Device"
image_count: 39
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Device
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 2051.34it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 3900
mean ar error (without repeats): 0.0
preparing accelerator
D:\PROJEKTY\SD\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint: D:/PROJEKTY/SD/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v13.safetensors
D:\PROJEKTY\SD\kohya_ss\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
loading u-net: <All keys matched successfully>
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\PROJEKTY\SD\kohya_ss\train_network.py:864 in <module> │
│ │
│ 861 │ args = parser.parse_args() │
│ 862 │ args = train_util.read_config_from_file(args, parser) │
│ 863 │ │
│ ❱ 864 │ train(args) │
│ 865 │
│ │
│ D:\PROJEKTY\SD\kohya_ss\train_network.py:160 in train │
│ │
│ 157 │ weight_dtype, save_dtype = train_util.prepare_dtype(args) │
│ 158 │ │
│ 159 │ # モデルを読み込む │
│ ❱ 160 │ text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accele │
│ 161 │ │
│ 162 │ # モデルに xformers とか memory efficient attention を組み込む │
│ 163 │ train_util.replace_unet_modules(unet, args.mem_eff_attn, args.xformers) │
│ │
│ D:\PROJEKTY\SD\kohya_ss\library\train_util.py:3061 in load_target_model │
│ │
│ 3058 │ │ if pi == accelerator.state.local_process_index: │
│ 3059 │ │ │ print(f"loading model for process {accelerator.state.local_process_index}/{a │
│ 3060 │ │ │ │
│ ❱ 3061 │ │ │ text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( │
│ 3062 │ │ │ │ args, weight_dtype, accelerator.device if args.lowram else "cpu" │
│ 3063 │ │ │ ) │
│ 3064 │
│ │
│ D:\PROJEKTY\SD\kohya_ss\library\train_util.py:3027 in _load_target_model │
│ │
│ 3024 │ load_stable_diffusion_format = os.path.isfile(name_or_path) # determine SD or Diffu │
│ 3025 │ if load_stable_diffusion_format: │
│ 3026 │ │ print(f"load StableDiffusion checkpoint: {name_or_path}") │
│ ❱ 3027 │ │ text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoin │
│ 3028 │ else: │
│ 3029 │ │ # Diffusers model is loaded to CPU │
│ 3030 │ │ print(f"load Diffusers pretrained models: {name_or_path}") │
│ │
│ D:\PROJEKTY\SD\kohya_ss\library\model_util.py:868 in │
│ load_models_from_stable_diffusion_checkpoint │
│ │
│ 865 │ │
│ 866 │ # Convert the VAE model. │
│ 867 │ vae_config = create_vae_diffusers_config() │
│ ❱ 868 │ converted_vae_checkpoint = convert_ldm_vae_checkpoint(state_dict, vae_config) │
│ 869 │ │
│ 870 │ vae = AutoencoderKL(**vae_config).to(device) │
│ 871 │ info = vae.load_state_dict(converted_vae_checkpoint) │
│ │
│ D:\PROJEKTY\SD\kohya_ss\library\model_util.py:384 in convert_ldm_vae_checkpoint │
│ │
│ 381 │ │
│ 382 │ new_checkpoint = {} │
│ 383 │ │
│ ❱ 384 │ new_checkpoint["encoder.conv_in.weight"] = vae_state_dict["encoder.conv_in.weight"] │
│ 385 │ new_checkpoint["encoder.conv_in.bias"] = vae_state_dict["encoder.conv_in.bias"] │
│ 386 │ new_checkpoint["encoder.conv_out.weight"] = vae_state_dict["encoder.conv_out.weight" │
│ 387 │ new_checkpoint["encoder.conv_out.bias"] = vae_state_dict["encoder.conv_out.bias"] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'encoder.conv_in.weight'
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\Wilq\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in _run_module_as_main │
│ │
│ 193 │ main_globals = sys.modules["__main__"].__dict__ │
│ 194 │ if alter_argv: │
│ 195 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 196 │ return _run_code(code, main_globals, None, │
│ 197 │ │ │ │ │ "__main__", mod_spec) │
│ 198 │
│ 199 def run_module(mod_name, init_globals=None, │
│ │
│ C:\Users\Wilq\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code │
│ │
│ 83 │ │ │ │ │ __loader__ = loader, │
│ 84 │ │ │ │ │ __package__ = pkg_name, │
│ 85 │ │ │ │ │ __spec__ = mod_spec) │
│ ❱ 86 │ exec(code, run_globals) │
│ 87 │ return run_globals │
│ 88 │
│ 89 def _run_module_code(code, init_globals=None, │
│ │
│ in <module>:7 │
│ │
│ 4 from accelerate.commands.accelerate_cli import main │
│ 5 if __name__ == '__main__': │
│ 6 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 7 │ sys.exit(main()) │
│ 8 │
│ │
│ D:\PROJEKTY\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if __name__ == "__main__": │
│ │
│ D:\PROJEKTY\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:918 in │
│ launch_command │
│ │
│ 915 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 916 │ │ sagemaker_launcher(defaults, args) │
│ 917 │ else: │
│ ❱ 918 │ │ simple_launcher(args) │
│ 919 │
│ 920 │
│ 921 def main(): │
│ │
│ D:\PROJEKTY\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:580 in │
│ simple_launcher │
│ │
│ 577 │ process.wait() │
│ 578 │ if process.returncode != 0: │
│ 579 │ │ if not args.quiet: │
│ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 581 │ │ else: │
│ 582 │ │ │ sys.exit(1) │
│ 583 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['D:\\PROJEKTY\\SD\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py',
'--enable_bucket',
'--pretrained_model_name_or_path=D:/PROJEKTY/SD/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v13.sa
fetensors', '--train_data_dir=D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/image', '--resolution=512,512',
'--output_dir=D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/model',
'--logging_dir=D:/PROJEKTY/SD/Lora_training_data/Hard_surface/Device/log', '--network_alpha=128',
'--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001',
'--network_dim=128', '--output_name=HardSurface_device', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001',
'--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=3900', '--save_every_n_epochs=1',
'--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents',
'--optimizer_type=Lion', '--max_data_loader_n_workers=0', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers',
'--bucket_no_upscale']' returned non-zero exit status 1.
It appears that the model you specified does not include VAE. Could you please try another model and see if you get the same error?
If the error does not appear in the other model, it may work if you merge the VAE into that model with some model merger.
for me, this is the case I didn't know that the model should have to include VAE, thanks!
I specify vae using "--vae", but same error occurs.
Traceback (most recent call last): File "/root/autodl-tmp/lora-scripts-minimal/./sd-scripts/train_network.py", line 873, in <module> train(args) File "/root/autodl-tmp/lora-scripts-minimal/./sd-scripts/train_network.py", line 168, in train text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator) File "/root/autodl-tmp/lora-scripts-minimal/sd-scripts/library/train_util.py", line 3149, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "/root/autodl-tmp/lora-scripts-minimal/sd-scripts/library/train_util.py", line 3115, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path, device) File "/root/autodl-tmp/lora-scripts-minimal/sd-scripts/library/model_util.py", line 871, in load_models_from_stable_diffusion_checkpoint info = vae.load_state_dict(converted_vae_checkpoint) File "/root/autodl-tmp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for AutoencoderKL: Missing key(s) in state_dict: "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias". Unexpected key(s) in state_dict: "encoder.mid_block.attentions.0.query.weight", "encoder.mid_block.attentions.0.query.bias", "encoder.mid_block.attentions.0.key.weight", "encoder.mid_block.attentions.0.key.bias", "encoder.mid_block.attentions.0.value.weight", "encoder.mid_block.attentions.0.value.bias", "encoder.mid_block.attentions.0.proj_attn.weight", "encoder.mid_block.attentions.0.proj_attn.bias", "decoder.mid_block.attentions.0.query.weight", "decoder.mid_block.attentions.0.query.bias", "decoder.mid_block.attentions.0.key.weight", "decoder.mid_block.attentions.0.key.bias", "decoder.mid_block.attentions.0.value.weight", "decoder.mid_block.attentions.0.value.bias", "decoder.mid_block.attentions.0.proj_attn.weight", "decoder.mid_block.attentions.0.proj_attn.bias". Traceback (most recent call last): File "/root/autodl-tmp/lora-scripts-minimal/venv/bin/accelerate", line 8, in <module> sys.exit(main()) File "/root/autodl-tmp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/root/autodl-tmp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 923, in launch_command simple_launcher(args) File "/root/autodl-tmp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/root/autodl-tmp/stable-diffusion-webui/venv/bin/python', './sd-scripts/train_network.py', '--vae=/root/autodl-tmp/stable-diffusion-webui/models/VAE/anime.vae.pt
Hello people I am having a trouble training a Lora like error 192 https://github.com/bmaltais/kohya_ss/issues/192 , but the solution there : "Alternatively, just replace library/train_util.py with kohya's new version https://github.com/kohya-ss/sd-scripts/blob/main/library/train_util.py"
Didnt do nothing here, by the way I just git pull so it wouldnt work....
The error message is :
what solved to me is turning off 8bit_adam