This is just a suggestion/idea, but I wonder if instead of tiling, the decode could be "interleaved" instead to prevent banding at the tile boundaries, by splitting the image into components of 2x2, 3x3, 4x4, 5x5 etc, and then VAE decoding like this (for the 3x3 example).
Then after all decoding passes the resulting images are each scaled x3 and then shifted and combined into the final image.
Or maybe more realistically this method could be combined with tiling, to determine the intensity of each pixel as a blending source for the tiled output instead of relying on overlap.
This is just a suggestion/idea, but I wonder if instead of tiling, the decode could be "interleaved" instead to prevent banding at the tile boundaries, by splitting the image into components of 2x2, 3x3, 4x4, 5x5 etc, and then VAE decoding like this (for the 3x3 example).
´´´ Input: 00 01 02 03 04 05 06 ... 10 11 12 13 14 15 16 ... 20 21 22 23 24 25 26 ... 30 31 32 33 34 35 36 ...
First decode pass: 00 03 06 ... 30 33 36 ...
2nd decode pass: 01 04 07 ... 31 34 37 ...
3rd: 02 05 07 ... 32 35 38 ...
4th: 10 13 16 ... 40 43 46 ... etc. ...
Then after all decoding passes the resulting images are each scaled x3 and then shifted and combined into the final image.
Or maybe more realistically this method could be combined with tiling, to determine the intensity of each pixel as a blending source for the tiled output instead of relying on overlap.