[Feature request] Exponential outer-image shrink?

Lex-DRL commented 4 months ago

I originally posted the idea here: https://github.com/Stability-AI/stablediffusion/issues/378 But I guess, even without any training, this approach can be used within USDU.

The idea

At each moment, USDU works with only a part of an image. Which, naturally, loses some context. The higher resolution we work on, the more it's lost. Currently, the workaround is adding an extra border with nearby parts of the image, but it clearly has some limitations. What if we do an extra step: in addition to adding an outside border, we also add the entire rest of the image, exponentially squashed to fit into much smaller width (a second, "distorted", border)?

A picture is worth a thousand words, so let's look at the example.

Say, we have this image and we're currently working on the following segment: _image Zones

Blue is USDU's extra border added for some context. And red is everything discarded, so SD has no idea of what's there and the only thing it sees is this: RawCropZones From this part alone, SD has no idea of huge mountains on the background and sunrise behind them. This could easily be just a rainy day.

Now, what if instead of discarding the red part altogether, we'd squash it and add it as a second "contextual border" around the first one? In this border, the first row of pixels is 1:1, but the farther away we go, the bigger is an area of source image covered by each pixel (in a geometric progression), losing some details but still giving us a crude approximation of the context. This nonlinear distortion ends up in a lens-like look: DistortedCropZones

Now, SD can see that there is a sunrise and mountains, taking those into account for the core part of the piece. Yeah, boundary is distorted, but I guess we can reduce artifacts by dynamically modifying conditioning in the red area, adding "lens distortion" there (and only there). In a nutshell, this approach follows the same algorithm which is used in realtime graphics for mip-mapping and, currently, in foveated rendering in VR: the farther away we go, the more we average, but still keep some approximation of a "big picture", literally. By it's very nature, this distortion is exponential, so we basically don't care how big the full image is. It could actually be millions of pixels wide and we'd still get just enough info on the surroundings in our single piece.

Maybe, it's worth implementing in USDU? This whole idea might be a dud, but my assumption is, this approach would significantly improve context awareness and therefore final results. SD already has some understanding of image distortion deep inside it (it successfully renders refractions, after all), so it should benefit from seeing the rest of the image, even if it's distorted. Shouldn't it?

ssitu commented 3 months ago

Interesting idea, it's worth a shot to implement and try it out. I'll give it a shot when I can, unless you or someone else wants to. I'm not sure how easy it is to add with the way I structured this repo.

Lex-DRL commented 3 months ago

Unfortunately, I have zero experience in coding ComfyUI nodes and with ML programming in general, so I'm in no help here.

What I can help with is how to find squash coefficients.

The outer border is a sum of geometric progression, where:

Each term (a1, a2, a3...) represents a row of pixels in the border. Or, more specifically, the area (width) of original image covered by this row.
1st row aka 1st term is 1.
Number of terms is the width of this border.
Sum of this progression is total width (height) of the part of the image we're trying to squash.
q aka r aka progression ratio is unknown, but we know it's between 1 and 2.
if the image being squashed is smaller than x1.25 of our border width, then I believe it's better not doing any distortion there at all.

AFAIK, there's no analytic way to detect progression coefficient, given the number of terms, 1st term and their sum. For the previews above, I detected q via bisection method, comparing the result of progression sum to the actual value of squashed width/height we aim to, until I got a q valiue resulting in a less then 1px deviation with this func:

def squashed_width(q, n=64, first_pixel_width=1.0):
    return first_pixel_width * (q**n - 1) / (q - 1)

Then, knowing this coefficient for all 4 sides of the border, I've built distorted UVs and sampled the above images. But I did it in a compositing software, so distortion (sampling) itself was already implemented by a special node there.

ssitu commented 3 months ago

Thanks for the info, I'll probably have questions when I get to adding it. I also don't know if the progression coefficient can be solved for. It seems like any root-finding method can be used to compute it though.

Lex-DRL commented 2 months ago

@ssitu Sooo... by any chance, did you have any opportunity to try approaching it?

ssitu commented 2 days ago

Sorry, haven't gotten a chance to. No promises for when I'll work on this, maybe someone else might want to give it a shot.

Lex-DRL commented 2 days ago

I'm well-familiar with python and image processing (shader programming even), but I'm not at all familiar with JS, writing client/server apps in python, any other WebDev-related stuff or ComfyUI's custom nodes API / best practices. I might try implementing such "exponential cropper" node myself, but I don't even know where to start. Could you suggest a introductory tutorial into custom nodes, explaining ComfyUI's inner workings to someone who came from a different field of programming?

ssitu commented 1 day ago

I first started off by looking at the example in the comfyui repo: https://github.com/comfyanonymous/ComfyUI/blob/master/custom_nodes/example_node.py.example If ".example" is removed, the node should get automatically imported by comfyui when the server starts.

There shouldn't be a need to do anything with JS or anything frontend related, unless you want to make a nice UI for cropping, which I remember some nodes doing so if you get that far you can try to see how those work. I would just try making a node that just takes some numbers and an image and then returns a transformation of the image.

The images in comfyui are represented by pytorch tensors, which are pretty much numpy arrays and are easily converted into each other. From there, you should be able to use any sort of python image processing with the numpy representations or even convert to PIL and other image library representations that may exist. Then a simple conversion back to a tensor for the return should work.

There are probably tutorials out there by now, but I've never looked at any. I pretty much just looked at what others were doing in their custom node code. Building off of the example node or breaking down an existing custom node should get you started though.

Another approach is to make the exponential shrink function by itself, making sure it works on local images, and then hooking it up to the API, which should avoid the hassle of testing through the frontend. Then all there is to do is to paste/import it into the example node, convert the input from a tensor, then convert the output of the function back to a tensor.

If you run into any trouble with the comfyui custom node API part, you can ask in the comfyui matrix space or just let me know.

Lex-DRL commented 1 day ago

Thanks a lot! I'll look into it the next time I get some spare time.

ssitu / ComfyUI_UltimateSDUpscale

[Feature request] Exponential outer-image shrink? #86

The idea