ssitu / ComfyUI_UltimateSDUpscale

ComfyUI nodes for the Ultimate Stable Diffusion Upscale script by Coyote-A.
GNU General Public License v3.0
821 stars 58 forks source link

suggestion - batch support #36

Open kalkal11 opened 10 months ago

kalkal11 commented 10 months ago

Granted it might only be useful to people with more vram, is it possible to add a feature that supports multiple img2img tiles being generated in a batch, say for example 4 at a time to increase overall inference speed?

ssitu commented 10 months ago

Wouldn't using larger tiles have a similar effect on the speed?

kalkal11 commented 10 months ago

Which at a certain point raises the question - why use tiles at all?

Coherence, obviously you know why, that's why we have tiled inference. There's a limit to tile size. If we can do inference on 8x512 tiles vs 1 at a time, we're making time savings while at the same time retaining coherence. (for example)

ssitu commented 10 months ago

So you mean the tile size may not be what the model was trained on? I’m not sure if I understand.

kalkal11 commented 10 months ago

If you process 16 images (each 512 pixels) one by one, 16 times, it's slower than doing all 16 at once in a single batch. When you check the time it takes to do all 16 in one go, it's actually faster compared to doing each image separately. This should presumably extend to tiled inference :)

ssitu commented 10 months ago

Oh yeah of course, but what I meant to ask is using a tile size of 512 doing 2 tiles at a time not the same as doing a tile size of 512x1024? I can add batching, but the results may not look the same as if you were to do 1 tile at a time. It was a problem that I ran into when I first tried adding it.

kalkal11 commented 10 months ago

Ah I see what you mean, are you doing some img2img for the tile blending? I can see that being a problem yes.

Seeing as we have the seam fix, perhaps that will be sufficient in combination with batched tiles? granted, it would mean to say that seam fix is essential in the batched scenario but it could still mean to say time is saved overall. I'm using 40gb vram btw, so while your example of 512x1024 is absolutely the same as 2x512, I could be doing all of the tiles at the same time :D

ssitu commented 10 months ago

Wow that is a lot of vram! At that point I would just use an upscale model and run the result through a KSampler with a small denoise. So the reason it would be that the original extension samples one tile, then pastes it back into the original image, then crops the next tile out and samples that, and the tile may have some data from the previous sampled tile, which may influence the sampling.

The seam fix would be the tile blending that you mentioned.

kalkal11 commented 10 months ago

Sure so as it stands, there's a problem with batch because you're not able to have that overlap because the preceding tile has not yet been generated (also true for all tiles, however, but batch+seam fix with the right denoise could still get around that problem right?

Also, inference at 2048 is viable natively without tiling with the vram I have to play with but this seems to be the upper limit via img2img (even with workarounds like kohya hires fix for example)

Which is pretty good, perhaps higher is going to be for fringe situations but it's still interesting to experiment.

kalkal11 commented 10 months ago

Was thinking about it, it should actually be possible but you'd have to follow a pattern for it to work, you'd start with the corners, then you'd be able to double the amount of tiles after that because each adjacent tile could be processed and so on. Means that you can't set a static batch size, but you can set a toggle for branching batch - maybe?

Also, unrelated but I saw elsewhere that ipadapter per tile was on the table, do you think that's something that will come? Also seems like something that could potentially assist greatly with coherence with the assistance of some controlnets?

ssitu commented 10 months ago

Looking at the code, I don't think I'll be able to add batching because it is going through the code of the original extension, which does the tiles in sequence. I'll think about it more.

As for ipadapter, I have don't know much about it, so I don't know exactly how it will fit into this node. I have some other priorities at the moment.

kalkal11 commented 10 months ago

Fair enough on the batches, in adapter, it just needs to be passed however it would normally be passed but on a per tile basis rather than globally to ensure the conditioning matches each tile. I saw a proof of concept node by someone but I don’t think that node is currently compatible with other controllers broadly defeating the point of it, but again, it was just labelled as proof of concept.

On Wed, 20 Dec 2023 at 01:56, ssitu @.***> wrote:

Looking at the code, I don't think I'll be able to add batching because it is going through the code of the original extension, which does the tiles in sequence. I'll think about it more.

As for ipadapter, I have don't know much about it, so I don't know exactly how it will fit into this node. I have some other priorities at the moment.

— Reply to this email directly, view it on GitHub https://github.com/ssitu/ComfyUI_UltimateSDUpscale/issues/36#issuecomment-1863728336, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2LBVVMV6X5CXYWN6WMFHI3YKJAVLAVCNFSM6AAAAABAFLBD7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTG4ZDQMZTGY . You are receiving this because you authored the thread.Message ID: @.***>