kohya-ss / sd-scripts

Apache License 2.0
5.13k stars 855 forks source link

[Bug] Sample generation is not deterministic when using `--seed` #1435

Open feffy380 opened 3 months ago

feffy380 commented 3 months ago

When training with a fixed seed, samples are different every time due to this block in train_util.sample_image_inference(), specifically the else branch: https://github.com/kohya-ss/sd-scripts/blob/25f961bc779bc79aef440813e3e8e92244ac5739/library/train_util.py#L5264-L5270

I understand that this was added so each process makes different images in a multi-gpu setting, but I think a better way to handle this would be to generate a random seed, then add the accelerator.process_index to it. This way each process has a unique seed while still being reproducible when using a fixed training seed.

LeroyDaydreamer commented 3 months ago

Isn't the seed argument only relevant for the training, mainly for the noise generation during training? The code listed here is only related to the sampling during training and not related to the seed used for training (set with set_seed() in accelerate.utils) as far as I understand it. Since the seed for sampling is defined per prompt, the code looks OK to me.

feffy380 commented 3 months ago

If I set a seed for training, I expect everything to be repeatable, including the randomly generated samples. This is useful for comparing the effect of different hyperparameters while still generating a variety of samples so you can be sure the improvement isn't just a lucky prompt seed.

LeroyDaydreamer commented 3 months ago

The original seed argument is not even referenced at this point. The documentation of the torch.seed function explicitly states that it is non-deterministic:

torch.seed()[SOURCE] Sets the seed for generating random numbers to a non-deterministic random number on all devices. Returns a 64 bit number used to seed the RNG. So adding any deterministic number to an non-deterministic number still results in a non-deterministic number. Also what's the point of getting a different seed per process? To be comparable per step or epoch, images must then be guaranteed to always sampled by the same process but if that is the case, the processes don't need a different seed anyway.

I get that you want a seed that changes but in a deterministic way but in that case the seed should be dependent on deterministic values only, for example the current step. It could even be training seed + hash(step) or something. But since you are given to option set the seed per sample (prompt), I still don't see why the consideration of a training seed is even required here. The training will be deterministic, independent from your samples either way. So what you want seems to be a feature request to implement a changing seed for sampling and is not a bug.

feffy380 commented 3 months ago

The training seed is used to seed all random number generators at the start. The use of torch.seed was added so multi-gpu setups don't generate the same image several times, but at the cost of making samples always random. This isn't a feature request, I'm pointing out this is how it used to work and the multi-gpu bugfix broke it by taking the nuclear option of reseeding everything. This wasn't necessary because the pipeline used for sample generation literally accepts a Generator object as an argument.

LeroyDaydreamer commented 3 months ago

Why would an image be generated multiple times by different GPUs?

The calling function distributes the available prompts to all available GPUs:

    if distributed_state.num_processes <= 1:
        # If only one device is available, just use the original prompt list. We don't need to care about the distribution of prompts.
        with torch.no_grad():
            for prompt_dict in prompts:
                sample_image_inference(
                    accelerator, args, pipeline, save_dir, prompt_dict, epoch, steps, prompt_replacement, controlnet=controlnet
                )
    else:
        # Creating list with N elements, where each element is a list of prompt_dicts, and N is the number of processes available (number of devices available)
        # prompt_dicts are assigned to lists based on order of processes, to attempt to time the image creation time to match enum order. Probably only works when steps and sampler are identical.
        per_process_prompts = []  # list of lists
        for i in range(distributed_state.num_processes):
            per_process_prompts.append(prompts[i :: distributed_state.num_processes])

        with torch.no_grad():
            with distributed_state.split_between_processes(per_process_prompts) as prompt_dict_lists:
                for prompt_dict in prompt_dict_lists[0]:
                    sample_image_inference(
                        accelerator, args, pipeline, save_dir, prompt_dict, epoch, steps, prompt_replacement, controlnet=controlnet
                    )

No prompt will be sampled multiple times by different GPUs at any time. Also if a seed is specified for a prompt, it is used and so not random. You brought up the argument that "lucky" seed may have been chosen, so you wanted a changing one. I made a suggestion for this too though I wonder if this point is even valid, since you can simply add more prompts with different seeds. Sorry, I still can't see a bug here. Maybe you can show a code snippet to illustrate how what you think is wrong can be fixed.