Closed Jimzhou82sub closed 1 week ago
Hmm, it's very weird... In the same code, the decode result seems to be -1 to +1.
It is unlikely that the value range is different between encode and decode. I modified flux_minimal_inference.py to explore the range of values with the code below.
with torch.autocast(device_type=device.type, dtype=ae_dtype):
x = ae.decode(x)
# test encode-decode
# x is generated image, -1 to 1
print(f"Max: {torch.max(x)}, Min: {torch.min(x)}, Mean: {torch.mean(x)}")
x2 = x * 0.5 + 0.5 # 0 to 1
x2 = ae.encode(x2) # encode
x2 = ae.decode(x2) # decode
print(f"Max: {torch.max(x2)}, Min: {torch.min(x2)}, Mean: {torch.mean(x2)}")
# save x2 image
x2 = x2.clamp(-1, 1)
x2 = x2.permute(0, 2, 3, 1)
img2 = Image.fromarray((127.5 * (x2 + 1.0)).float().cpu().numpy().astype(np.uint8)[0])
img2.save("x2_01.png")
The result is:
Max: 0.99609375, Min: -0.91796875, Mean: 0.01385498046875
Max: 0.98828125, Min: 0.020751953125, Mean: 0.490234375
And the saved image x2_01.png is became whitish. So I think the img2img code might be incorrect.
Great,your professionalism is admirable
In the flux source code(img2img), convert [0, 255] to [0, 1], https://github.com/black-forest-labs/flux/blob/main/demo_gr.py#L73,but in your code, you convert the image to [-1, 1], https://github.com/kohya-ss/sd-scripts/blob/sd3/library/train_util.py#L133