Loading settings from /content/fine_tune/config/config_file.toml...
/content/fine_tune/config/config_file
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Training with captions.
loading existing metadata: /content/fine_tune/meta_lat.json
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
batch_size: 4
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: None
max_bucket_reso: None
bucket_reso_steps: None
bucket_no_upscale: None
Loading settings from /content/fine_tune/config/config_file.toml... /content/fine_tune/config/config_file You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Training with captions. loading existing metadata: /content/fine_tune/meta_lat.json using bucket info in metadata / メタデータ内のbucket情報を使います [Dataset 0] batch_size: 4 resolution: (1024, 1024) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: None max_bucket_reso: None bucket_reso_steps: None bucket_no_upscale: None[Subset 0 of Dataset 0] image_dir: "/content/fine_tune/train_data" image_count: 30 num_repeats: 20 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, metadata_file: /content/fine_tune/meta_lat.json
[Dataset 0] loading image sizes. 100% 30/30 [00:00<00:00, 691368.79it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 1024), count: 60 bucket 1: resolution (576, 1024), count: 280 bucket 2: resolution (704, 1024), count: 40 bucket 3: resolution (768, 1024), count: 100 bucket 4: resolution (832, 1024), count: 40 bucket 5: resolution (1024, 704), count: 20 bucket 6: resolution (1024, 768), count: 20 bucket 7: resolution (1024, 1024), count: 40 mean ar error (without repeats): 0.0 prepare accelerator accelerator device: cuda Loading SD3 models from /content/pretrained_model/sd3_medium.safetensors loading model for process 0/1 Building VAE Loading state dict... Loaded VAE:
[Dataset 0]
caching latents.
checking cache validity...
100% 30/30 [00:00<00:00, 554313.30it/s]
caching latents...
0it [00:00, ?it/s]
loading model for process 0/1
Loading clip_l from /content/pretrained_model/clip_l.safetensors...
Building ClipL
Loading state dict...
Loaded ClipL:
loading model for process 0/1
Loading clip_g from /content/pretrained_model/clip_g.safetensors...
Building ClipG
Loading state dict...
Loaded ClipG:
loading model for process 0/1
Loading t5xxl from /content/pretrained_model/t5xxl_fp16.safetensors...
Building T5XXL
Loading state dict...
Loaded T5XXL:
[Dataset 0]
caching text encoder outputs.
checking cache existence...
100% 30/30 [00:00<00:00, 134146.18it/s]
caching text encoder outputs...
0it [00:00, ?it/s]
loading model for process 0/1
Building MMDit
Loading state dict...
Loaded MMDiT:
train mmdit: True
number of models: 1
number of trainable parameters: 2028328000
prepare optimizer, data loader etc.
use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False}
constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません
running training / 学習開始
num examples / サンプル数: 600
num batches per epoch / 1epochのバッチ数: 150
num epochs / epoch数: 53
batch size per device / バッチサイズ: 4
gradient accumulation steps / 勾配を合計するステップ数 = 4
total optimization steps / 学習ステップ数: 2014
steps: 0% 0/2014 [00:00<?, ?it/s]
epoch 1/53
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
Traceback (most recent call last):
File "/content/kohya-trainer/sd3_train.py", line 974, in
train(args)
File "/content/kohya-trainer/sd3_train.py", line 750, in train
model_pred = mmdit(noisy_model_input, timesteps, context=context, y=pool)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 680, in forward
return model_forward(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 668, in call
return convert_to_fp32(self.model_forward(*args, kwargs))
File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, *kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 998, in forward
x = self.x_embedder(x) + self.cropped_pos_embed(H, W, device=x.device).to(dtype=x.dtype)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 298, in forward
x = self.proj(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead
steps: 0% 0/2014 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'sd3_train.py', '--sample_prompts=/content/fine_tune/config/sample_prompt.toml', '--config_file=/content/fine_tune/config/config_file.toml', '--clip_l=/content/pretrained_model/clip_l.safetensors', '--clip_g=/content/pretrained_model/clip_g.safetensors', '--t5xxl=/content/pretrained_model/t5xxl_fp16.safetensors', '--t5xxl_dtype=fp16']' returned non-zero exit status 1.