Generation stops at same position every time

Kallamamran commented 3 weeks ago

When running the training the generation stops with this error every time and always with "codec can't decode byte 0xeb in position 604". Running the same training as one that worked. Only difference is these images are not cropped and rescaled, so now it's running bucketing. Also there are 338 images instead of 169.

codec can't decode byte 0xeb in position 604: invalid continuation byte

========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
  File "D:\ai-toolkit\run.py", line 90, in <module>
    main()
  File "D:\ai-toolkit\run.py", line 86, in main
    raise e
  File "D:\ai-toolkit\run.py", line 78, in main
    job.run()
  File "D:\ai-toolkit\jobs\ExtensionJob.py", line 22, in run
    process.run()
  File "D:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1667, in run
    batch = next(dataloader_iterator)
  File "D:\ai-toolkit\venv\lib\site-packages\torch\utils\data\dataloader.py", line 630, in __next__
    data = self._next_data()
  File "D:\ai-toolkit\venv\lib\site-packages\torch\utils\data\dataloader.py", line 673, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "D:\ai-toolkit\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch
    data = self.dataset[possibly_batched_index]
  File "D:\ai-toolkit\venv\lib\site-packages\torch\utils\data\dataset.py", line 350, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "D:\ai-toolkit\toolkit\data_loader.py", line 539, in __getitem__
    return [self._get_single_item(idx) for idx in idx_list]
  File "D:\ai-toolkit\toolkit\data_loader.py", line 539, in <listcomp>
    return [self._get_single_item(idx) for idx in idx_list]
  File "D:\ai-toolkit\toolkit\data_loader.py", line 527, in _get_single_item
    file_item.load_caption(self.caption_dict)
  File "D:\ai-toolkit\toolkit\dataloader_mixins.py", line 305, in load_caption
    prompt = f.read()
  File "C:\Program Files\Python310\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 604: invalid continuation byte
amaz1ng:  14%|████▉                             | 576/4000 [12:27<2:10:56,  2.29s/it, lr: 1.0e-04 loss: 4.666e-01]

D-Ogi commented 3 weeks ago

I recommend looping through all the images and saving them as PNGs. This should help ensure that there are no corrupted files causing the error. Below is a Python code that you can be used to perform this. Code includes error handling to skip any problematic files

import os
from PIL import Image

def convert_images_to_png(image_folder, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    for root, _, files in os.walk(image_folder):
        for file in files:
            try:
                # Construct full file path
                file_path = os.path.join(root, file)

                # Open the image file
                with Image.open(file_path) as img:
                    # Convert the image to RGB mode if not already
                    img = img.convert("RGB")

                    # Construct the output file path
                    output_file_path = os.path.join(output_folder, os.path.splitext(file)[0] + ".png")

                    # Save the image as a PNG
                    img.save(output_file_path, "PNG")

                    print(f"Successfully converted {file} to PNG.")
            except Exception as e:
                print(f"Error converting {file}: {e}")
                continue

image_folder = "path/to/your/images"
output_folder = "path/to/save/pngs"

convert_images_to_png(image_folder, output_folder)

How to Use:

Replace "path/to/your/images" with the directory where your images are stored.
Replace "path/to/save/pngs" with the directory where you want to save the converted PNG files.
Run the script. It will convert all images to PNG format, skipping any files that cause errors.

This approach should help in identifying if the issue is with a particular image or set of images.

derpina-ai commented 3 weeks ago

check that each and all your caption file txt are encoded in UTF8. The problem lies there. You can use Notepad++ to resave them or convert them.

ostris / ai-toolkit

Generation stops at same position every time #118

How to Use: