kohya-ss / sd-scripts

Apache License 2.0
5.34k stars 881 forks source link

MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg'] #1428

Open voplica-git opened 4 months ago

voplica-git commented 4 months ago

For some reason I cannot use is_reg parameter for DreamBooth type training. I'm using the latest commit from dev branch. My dataset config is the next:

shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1

# This is a DreamBooth-style dataset
resolution = [1024, 1280]
batch_size = 1
enable_bucket = true
bucket_no_upscale = true

  image_dir = "/path/to/images/"
  conditioning_data_dir = "/path/to/masks/"
  num_repeats = 63

  is_reg = true
  image_dir = "/path/to/reg_images/"
  conditioning_data_dir = "/path/to/reg_masks/"
  cache_info = true
  num_repeats = 1

When I hit "Start training" I get the following error:

                    WARNING  clip_skip will be unexpected /                    sdxl_train_util.py:352
2024-07-16 23:41:45 INFO     prepare tokenizers                                sdxl_train_util.py:138
2024-07-16 23:41:46 INFO     update token length: 75                           sdxl_train_util.py:163
                    INFO     Load dataset config from                               sdxl_train.py:133
                    WARNING  ignore following options because config file is found: sdxl_train.py:137
                             train_data_dir, in_json /                                               
                             ます: train_data_dir, in_json                                           
                    ERROR    Invalid user config /                                 config_util.py:373
Traceback (most recent call last):
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 948, in <module>
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 169, in train
    blueprint = blueprint_generator.generate(user_config, args, tokenizer=[tokenizer1, tokenizer2])
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 407, in generate
    sanitized_user_config = self.sanitizer.sanitize_user_config(user_config)
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 370, in sanitize_user_config
    return self.user_config_validator(user_config)
  File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 595, in validate_dict
    return base_validate(path, iteritems(data), out)
  File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 433, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg']
E0716 23:41:50.868000 130445689476160 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 200042) of binary: /srv/shared/AI/LoraTraining/kohya_ss/venv/bin/python

However, if I remove is_reg option and hit "Start training" I get the following error:

                    INFO     11395 train images with repeating.                    train_util.py:1678
                    INFO     0 reg images.                                         train_util.py:1681
                    WARNING  no regularization images /                            train_util.py:1686
Traceback (most recent call last):
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 948, in <module>
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 170, in train
    train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group)
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 487, in generate_dataset_group_by_blueprint
    dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params))
  File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/train_util.py", line 2038, in __init__
    len(missing_imgs) == 0
AssertionError: missing conditioning data for 5662 images / 制御用画像が見つかりませんでした: ['s2_000000001', 's2_000000002', 's2_000000003', 's2_000000004', 's2_000000005', 's2_000000006', 's2_000000007', 's2_000000008', 's2_000000009', 's2_000000010', 's2_000000011', 's2_000000012', 's2_000000013', 's2_000000014', 's2_000000015', 's2_000000016', 's2_000000017', 's2_000000018', 

I can't figure out why is_reg parameter is not supported. Any help is really appreciated!

voplica-git commented 4 months ago

Sorry. Wrong repository. Opened it in the GUI issues: https://github.com/bmaltais/kohya_ss/issues/2647

voplica-git commented 4 months ago

Actually I realized that the issue comes from sdxl_train.py which is sd-scripts. So I believe the issue was opened correctly. Thus, reopening it.

gesen2egee commented 4 months ago

conditioning_data_dir cannot be used with is_reg. (Due to architectural issues

voplica-git commented 4 months ago

conditioning_data_dir cannot be used with is_reg. (Due to architectural issues

Thank you for the information. However, if I remove conditioning_data_dir from the second subset (DreamBooth) then it fails with the exception about the first subset that conditioning_data_dir is not expected. This is a bit strange because the documentation says that conditioning_data_dir should work with DreamBooth approach, but maybe I'm missing something: https://github.com/kohya-ss/sd-scripts/blob/main/docs/train_lllite_README.md#preparing-the-dataset

kohya-ss commented 4 months ago

Thank you for reporting the issue. I think you are using masked loss with the dataset with conditioning_data_dir. Unfortunately, conditioning_data_dir is not supported with is_reg option. I will update the documentation.

As a workaround, please set the number of repeats to the dataset to balance the number of images for each dataset.

voplica-git commented 4 months ago

Thank you for reporting the issue. I think you are using masked loss with the dataset with conditioning_data_dir. Unfortunately, conditioning_data_dir is not supported with is_reg option. I will update the documentation.

As a workaround, please set the number of repeats to the dataset to balance the number of images for each dataset.

Do you mean conditioning_data_dir doesn't work in the subset where is_reg is used ([[datasets.subsets]]) or do you mean conditioning_data_dir doesn't work in all subsets if at least one subset contains is_reg option? In other words, is this a valid configuration?

shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1

resolution = [1024, 1280]
batch_size = 1
enable_bucket = true
bucket_no_upscale = true

  image_dir = "/path/to/images/"
  conditioning_data_dir = "/path/to/masks/"
  num_repeats = 63

  is_reg = true
  image_dir = "/path/to/reg_images/"
  cache_info = true
  num_repeats = 1
os144046 commented 1 month ago

conditioning_data_dir doesn't work in all subsets if at least one subset contains is_reg option

This appears to be the case. I guess it makes sense that you wouldn't want to use masked training images with unmasked regularization images?