kohya-ss / sd-scripts

Apache License 2.0
5.05k stars 845 forks source link

about flux image normalization #1642

Closed Jimzhou82sub closed 1 week ago

Jimzhou82sub commented 1 week ago

In the flux source code(img2img), convert [0, 255] to [0, 1], https://github.com/black-forest-labs/flux/blob/main/demo_gr.py#L73,but in your code, you convert the image to [-1, 1], https://github.com/kohya-ss/sd-scripts/blob/sd3/library/train_util.py#L133

kohya-ss commented 1 week ago

Hmm, it's very weird... In the same code, the decode result seems to be -1 to +1.

https://github.com/black-forest-labs/flux/blob/87f6fff727a377ea1c378af692afb41ae84cbe04/demo_gr.py#L140

It is unlikely that the value range is different between encode and decode. I modified flux_minimal_inference.py to explore the range of values ​​with the code below.

            with torch.autocast(device_type=device.type, dtype=ae_dtype):
                x = ae.decode(x)

                # test encode-decode
                # x is generated image, -1 to 1
                print(f"Max: {torch.max(x)}, Min: {torch.min(x)}, Mean: {torch.mean(x)}")
                x2 = x * 0.5 + 0.5  # 0 to 1
                x2 = ae.encode(x2)  # encode
                x2 = ae.decode(x2)  # decode
                print(f"Max: {torch.max(x2)}, Min: {torch.min(x2)}, Mean: {torch.mean(x2)}")

                # save x2 image
                x2 = x2.clamp(-1, 1)
                x2 = x2.permute(0, 2, 3, 1)
                img2 = Image.fromarray((127.5 * (x2 + 1.0)).float().cpu().numpy().astype(np.uint8)[0])
                img2.save("x2_01.png")

The result is:

Max: 0.99609375, Min: -0.91796875, Mean: 0.01385498046875
Max: 0.98828125, Min: 0.020751953125, Mean: 0.490234375

And the saved image x2_01.png is became whitish. So I think the img2img code might be incorrect.

Jimzhou82sub commented 1 week ago

Great,your professionalism is admirable