Open RorutopThe2nd opened 3 months ago
Could you elaborate on what you mean?
Currently you can create subfolders in the tagger, and tag them one at a time. You can also choose multiple folders to train in the Extras of the trainer. You must specify how many repeats for each folder.
That is a good idea, but would require more effort than I'm willing to spend right now. It's a bit harder than it seems.
Got bored, try this. I think it should work But it might be kinda messy because the custom_dataset was None at the start
def validate_dataset():
global lr_warmup_steps, lr_warmup_ratio, caption_extension, keep_tokens
supported_types = (".png", ".jpg", ".jpeg", ".webp", ".bmp")
print("\nšæ Checking dataset...")
if not project_name.strip() or any(c in project_name for c in " .()\"'\\/"):
print("š„ Error: Please choose a valid project name.")
return
datasets = []
print(custom_dataset)
if len(custom_dataset)>0:
try:
datconf = toml.loads(custom_dataset)
for d in datconf["datasets"][0]["subsets"]: datasets.append(d)
except:
print(f"š„ Error: Your custom dataset is invalid or contains an error! Please check the original template.")
return
leftover_folders = [root for root,dirs,files in os.walk(images_folder) if os.path.isdir(root) and not os.path.relpath(root,images_folder) == "."]
print(leftover_folders)
folderExclude = []
for d in datasets:
if d.get("image_dir") in leftover_folders: folderExclude.append(d)
for f in [_f for _f in leftover_folders if not _f in folderExclude]:
datasets.append({
'image_dir': f,
'num_repeats': num_repeats
})
print(datasets)
reg = [d.get("image_dir") for d in datasets if d.get("is_reg", False)]
datasets_dict = {d["image_dir"]: d["num_repeats"] for d in datasets}
folders = datasets_dict.keys()
files = [f for folder in folders for f in os.listdir(folder)]
images_repeats = {folder: (len([f for f in os.listdir(folder) if f.lower().endswith(supported_types)]), datasets_dict[folder]) for folder in folders}
print(images_repeats)
for folder in folders:
if not os.path.exists(folder):
print(f"š„ Error: The folder {folder.replace('/content/drive/', '')} doesn't exist.")
return
for folder, (img, rep) in images_repeats.items():
if not img:
print(f"š„ Error: Your {folder.replace('/content/drive/', '')} folder is empty.")
return
for f in files:
if not f.lower().endswith((".txt", ".npz")) and not f.lower().endswith(supported_types):
print(f"š„ Error: Invalid file in dataset: \"{f}\". Aborting.")
return
Trainer Notebooks like Linaqruf's XL Trainer and Kohya SS does that, so would love to have this without just moving all of the files to one folder This should apply to taggers too