ljleb / sd-webui-neutral-prompt

Collision-free AND keywords for a1111 webui!
MIT License
187 stars 13 forks source link

dont use delta-space when possible #62

Closed ljleb closed 7 months ago

ljleb commented 7 months ago

this makes it so that normal prompts will have identical results whether the extension is enabled or not.

ljleb commented 7 months ago

1girl (2 latents, 1 cond and 1 uncond) and 1girl AND_PERP 1boy:0 (3 latents, 2 conds and 1 uncond) give different results on my system even though they should be identical.

The reason is that, apparently, the first Conv2d layer encountered in the forward function of the model (the first processed h) gives different results depending on whether it is given a batch of 3 tensors or 2. The error propagates to the rest of the unet layers, and then the rest of the denoising steps. It gives a drastically different image as a result.

This isn't something I can fix. It has implications for any processing that changes the batch size of the unet. This extension heavily relies on a dynamic unet batch size.

wbclark commented 7 months ago

The reason is that, apparently, the first Conv2d layer encountered in the forward function of the model (the first processed h) gives different results depending on whether it is given a batch of 3 tensors or 2. The error propagates to the rest of the unet layers, and then the rest of the denoising steps. It gives a drastically different image as a result.

If I may ask for learning purposes, how did you determine this? (Tool that you used or method for testing, external source that you consulted, etc)?

Does it depend on the sampler, or matter if we disable the A1111 setting that batches cond and uncond?

ljleb commented 7 months ago

If I may ask for learning purposes, how did you determine this?

I debugged in the code. Exact same input noise 2x gives slight precision differences from exact same input 3x.

I tested in an isolated environment to make sure (asked chatgpt because I was lazy):

import torch
import torch.nn as nn

# Creating a batch of random tensors (e.g., a batch of 5 images, each with 3 channels, 64x64 pixels)
batch_size = 5
channels = 3
height = width = 64
tensors = torch.randn(batch_size, channels, height, width)

# Defining a 2D convolutional layer
conv_layer = nn.Conv2d(in_channels=channels, out_channels=6, kernel_size=3, stride=1, padding=1)

# Applying the convolutional layer to the entire batch
batch_result = conv_layer(tensors)

# Applying the convolutional layer to each tensor individually
individual_results = torch.stack([conv_layer(t.unsqueeze(0)) for t in tensors])

# Checking if the results are exactly the same
result_comparison = torch.nonzero(batch_result.squeeze() != individual_results.squeeze())

print((result_comparison, result_comparison.shape))

Which prints

(tensor([[ 0,  0,  0,  1],
         [ 0,  0,  0,  2],
         [ 0,  0,  0,  3],
         ...,
         [ 4,  5, 63, 60],
         [ 4,  5, 63, 61],
         [ 4,  5, 63, 62]]),
 torch.Size([93901, 4]))
ljleb commented 7 months ago

Does it depend on the sampler, or matter if we disable the A1111 setting that batches cond and uncond?

Sampler shouldn't matter, unless a sampler changes the batch size. Batch cond uncond should change the result, but when I tested it becomes automatically enabled even when I uncheck it and save settings, as soon as I generate an image. Maybe I have an extension that messes with this setting, I haven't investigated the auto-enable of batch_cond_uncond.

wbclark commented 7 months ago

Sampler shouldn't matter, unless a sampler changes the batch size. Batch cond uncond should change the result, but when I tested it becomes automatically enabled even when I uncheck it and save settings, as soon as I generate an image. Maybe I have an extension that messes with this setting, I haven't investigated the auto-enable of batch_cond_uncond.

1girl (2 latents, 1 cond and 1 uncond) and 1girl AND_PERP 1boy:0 (3 latents, 2 conds and 1 uncond) give different results on my system even though they should be identical.

The reason is that, apparently, the first Conv2d layer encountered in the forward function of the model (the first processed h) gives different results depending on whether it is given a batch of 3 tensors or 2. The error propagates to the rest of the unet layers, and then the rest of the denoising steps. It gives a drastically different image as a result.

I think batch_cond_uncond is the issue but I can't reproduce a drastic difference like you described. Even when I crank CFG up to 20 I get only very slight difference between a vs. a AND_PERP b :0.

When I disable batch_cond_uncond, the images are identical.

I had all other extensions disabled while testing, and I tried both empty and non-empty negative prompts.

ljleb commented 7 months ago

Even when I crank CFG up to 20 I get only very slight difference between a vs. a AND_PERP b :0

To be clear, by "different images", I don't mean to say that they have different content, but that practically all pixel values seem to change at least a little bit. I think you reproduced the issue.

For batch cond uncond, if you disable it under settings>optimizations, then generate, then go back to settings and apply settings without explicitly checking or unchecking batch cond uncond, do you see the setting being applied again? This is how I determined that it was automatically enabled when clicking on generate.

wbclark commented 7 months ago

For batch cond uncond, if you disable it under settings>optimizations, then generate, then go back to settings and apply settings without explicitly checking or unchecking batch cond uncond, do you see the setting being applied again? This is how I determined that it was automatically enabled when clicking on generate.

Nope, it stays disabled for me, and only when I manually re-enable it do the differences return.

If it still happens even once other extensions are enabled, maybe you have the old CLI version of that setting in your launch script and that's re-enabling it? (Just a dumb guess, I haven't actually looked at how they implemented the change to it being a WebUI setting... my gut feeling is another extension is more likely responsible)