kohya-ss / sd-scripts

Apache License 2.0
4.86k stars 810 forks source link

so many errors I don't know where to start #518

Open topdeckg opened 1 year ago

topdeckg commented 1 year ago

stable diffusion webui works just fine, I've got automatic1111 and other forks all working just fine on this machine.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ E:\kohya_ss\train_network.py:783 in │ │ │ │ 780 │ args = parser.parse_args() │ │ 781 │ args = train_util.read_config_from_file(args, parser) │ │ 782 │ │ │ ❱ 783 │ train(args) │ │ 784 │ │ │ │ E:\kohya_ss\train_network.py:157 in train │ │ │ │ 154 │ │ vae.requiresgrad(False) │ │ 155 │ │ vae.eval() │ │ 156 │ │ with torch.no_grad(): │ │ ❱ 157 │ │ │ train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_laten │ │ 158 │ │ vae.to("cpu") │ │ 159 │ │ if torch.cuda.is_available(): │ │ 160 │ │ │ torch.cuda.empty_cache() │ │ │ │ E:\kohya_ss\library\train_util.py:1399 in cache_latents │ │ │ │ 1396 │ def cache_latents(self, vae, vae_batch_size=1, cache_to_disk=False, is_main_process= │ │ 1397 │ │ for i, dataset in enumerate(self.datasets): │ │ 1398 │ │ │ print(f"[Dataset {i}]") │ │ ❱ 1399 │ │ │ dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process) │ │ 1400 │ │ │ 1401 │ def is_latent_cacheable(self) -> bool: │ │ 1402 │ │ return all([dataset.is_latent_cacheable() for dataset in self.datasets]) │ │ │ │ E:\kohya_ss\library\train_util.py:812 in cache_latents │ │ │ │ 809 │ │ │ img_tensors = torch.stack(images, dim=0) │ │ 810 │ │ │ img_tensors = img_tensors.to(device=vae.device, dtype=vae.dtype) │ │ 811 │ │ │ │ │ ❱ 812 │ │ │ latents = vae.encode(img_tensors).latent_dist.sample().to("cpu") │ │ 813 │ │ │ │ │ 814 │ │ │ for info, latent in zip(batch, latents): │ │ 815 │ │ │ │ if cache_to_disk: │ │ │ │ C:\Python310\lib\site-packages\diffusers\models\vae.py:566 in encode │ │ │ │ 563 │ │ self.use_slicing = False │ │ 564 │ │ │ 565 │ def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOut │ │ ❱ 566 │ │ h = self.encoder(x) │ │ 567 │ │ moments = self.quant_conv(h) │ │ 568 │ │ posterior = DiagonalGaussianDistribution(moments) │ │ 569 │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, *kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ C:\Python310\lib\site-packages\diffusers\models\vae.py:130 in forward │ │ │ │ 127 │ │ │ 128 │ def forward(self, x): │ │ 129 │ │ sample = x │ │ ❱ 130 │ │ sample = self.conv_in(sample) │ │ 131 │ │ │ │ 132 │ │ # down │ │ 133 │ │ for down_block in self.down_blocks: │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\conv.py:463 in forward │ │ │ │ 460 │ │ │ │ │ │ self.padding, self.dilation, self.groups) │ │ 461 │ │ │ 462 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 463 │ │ return self._conv_forward(input, self.weight, self.bias) │ │ 464 │ │ 465 class Conv3d(_ConvNd): │ │ 466 │ doc = r"""Applies a 3D convolution over an input signal composed of several inpu │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\conv.py:459 in _conv_forward │ │ │ │ 456 │ │ │ return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=sel │ │ 457 │ │ │ │ │ │ │ weight, bias, self.stride, │ │ 458 │ │ │ │ │ │ │ _pair(0), self.dilation, self.groups) │ │ ❱ 459 │ │ return F.conv2d(input, weight, bias, self.stride, │ │ 460 │ │ │ │ │ │ self.padding, self.dilation, self.groups) │ │ 461 │ │ │ 462 │ def forward(self, input: Tensor) -> Tensor: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: "slow_conv2d_cpu" not implemented for 'Half' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Python310\lib\runpy.py:196 in _run_module_as_main │ │ │ │ 193 │ main_globals = sys.modules["main"].dict │ │ 194 │ if alter_argv: │ │ 195 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 196 │ return _run_code(code, main_globals, None, │ │ 197 │ │ │ │ │ "main", mod_spec) │ │ 198 │ │ 199 def run_module(mod_name, init_globals=None, │ │ │ │ C:\Python310\lib\runpy.py:86 in _run_code │ │ │ │ 83 │ │ │ │ │ loader = loader, │ │ 84 │ │ │ │ │ package = pkg_name, │ │ 85 │ │ │ │ │ spec = mod_spec) │ │ ❱ 86 │ exec(code, run_globals) │ │ 87 │ return run_globals │ │ 88 │ │ 89 def _run_module_code(code, init_globals=None, │ │ │ │ C:\Python310\Scripts\accelerate.exe__main.py:7 in │ │ │ │ [Errno 2] No such file or directory: 'C:\Python310\Scripts\accelerate.exe\main.py' │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main__": │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\launch.py:923 in launch_command │ │ │ │ 920 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 921 │ │ sagemaker_launcher(defaults, args) │ │ 922 │ else: │ │ ❱ 923 │ │ simple_launcher(args) │ │ 924 │ │ 925 │ │ 926 def main(): │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\launch.py:579 in simple_launcher │ │ │ │ 576 │ process.wait() │ │ 577 │ if process.returncode != 0: │ │ 578 │ │ if not args.quiet: │ │ ❱ 579 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 580 │ │ else: │ │ 581 │ │ │ sys.exit(1) │ │ 582 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['C:\Python310\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=E:/a1111/stable-diffusion-webui/models/Stable-diffusion/000/v1-5-pruned-emaonly.ckpt', '--train_data_dir=E:/kohya_ss/train/PatriciaDoyle/images', '--resolution=768,768', '--output_dir=E:/kohya_ss/train/PatriciaDoyle/model', '--logging_dir=E:/kohya_ss/train/PatriciaDoyle/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=PatriciaDoyle', '--lr_scheduler_num_cycles=10', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=760', '--train_batch_size=1', '--max_train_steps=7600', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

kohya-ss commented 1 year ago

It seems that the GPU is not detected correctly, please make sure that a GPU version of PyTorch is installed on your venv.

JLuke73 commented 1 year ago

Also make sure you've selected the correct gpu "by ID", which means either 0 or 1 when prompted.

To fix:

Run setup.bat, then:

  1. (Optional) Manually configure accelerate

    No distributed training Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:no Do you wish to optimize your script with torch dynamo?[yes/NO]:no Do you want to use DeepSpeed? [yes/NO]: no What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0

Whatever else you do, make sure that last "0" is set for your primary GPU to be detected.

sunnysharma12 commented 4 months ago

Also make sure you've selected the correct gpu "by ID", which means either 0 or 1 when prompted.

To fix:

Run setup.bat, then: 4. (Optional) Manually configure accelerate

No distributed training Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:no Do you wish to optimize your script with torch dynamo?[yes/NO]:no Do you want to use DeepSpeed? [yes/NO]: no What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0

Whatever else you do, make sure that last "0" is set for your primary GPU to be detected.

Thanks. This worked for me and solved my issue.