How does composite() merge colors?

TWCCarlson commented 5 months ago

Hello again,

I am attempting to manipulate two images plane_0 and plane_1 such that I:

Remove all the black pixels from plane_1, replacing with transparency
Overlay (alpha composite) plane_1 atop plane_0

I've done what was recommended here to strip away the black pixels, and I get an image which looks like what I expect (this is a subsection of the image approximately 80% of the way down and 25% across; the checkboard is how GIMP is representing the transparency):

So I think that is working well. However, when I execute the composite I get an output like this:

Which, while kind of retro-cool-looking, isn't right. I expected the pixels which are not transparent on plane_1 to fully replace those on plane_0 per the Wikipedia entry on alpha compositing as I gave all pixels an alpha value of 255.

image0 = f"./plane_0.png"
image1 = f"./plane_1.png"
im0 = pv.Image.new_from_file(image0).bandjoin(255)
im1 = pv.Image.new_from_file(image1).bandjoin(255)

colorDistance = sum(((im1 - [0,0,0,255]) ** 2).bandsplit()) ** 0.5
alpha = 255 * colorDistance / 10
alphaImage = im1[0:3].bandjoin(alpha)
if not os.path.exists("./debug"):
    os.makedirs("./debug")
alphaImage.write_to_file(f"debug/alphaplane_1.png")
compositeImage = im0.composite(alphaImage, 'over')
compositeImage.write_to_file(f"debug/plane_1.png")

Looking at the colors that end up being rendered I notice that most of them are 255 in one or two bands. Some other frequent values are 72 and 0. This makes me think the problem could lie in some of the math being done, perhaps some kind of value capping.

Have I approached this incorrectly?

Here are the images: https://drive.google.com/drive/folders/11N3relUy4yhual9H3h5WUv7sbUsXV0Qq?usp=drive_link

TWCCarlson commented 5 months ago

I've noticed that the datatype for each image is different. When I first load them up: <pyvips.Image 13056x45568 uchar, 4 bands, srgb>

After calculating the color distance: <pyvips.Image 13056x45568 float, 1 bands, srgb>

It remains a float for the rest of the runtime. Could the interaction between uchar (nump detects uint8) and float (numpy detects float32) be the issue?

jcupitt commented 5 months ago

Hi @TWCCarlson,

Yes, all libvips operations are value-preserving, ie. outputs are big enough to hold the whole output range. uchar + uchar -> ushort, for example.

I'll make you a tiny demo prog.

jcupitt commented 5 months ago

There are lots of ways of doing this, but I think the most straightforward is probably by making a boolean mask image and using ifthenelse.

Make a mask

Make a mask image that's TRUE for the non-zero pixels in the overlay (plane_1).

overlay != 0 will be a three-band image with 255 (non-zero, or TRUE) in band 0 (red) when band 0 is not zero.

For a mask (or alpha) you want to find pixels where every band is non-zero, so you need to AND all the bands together with bandand.

Therefore:

mask = (overlay != 0).bandor()

Now mask is a one-band uchar image which will be TRUE (255) for all pixels which are equal to [0, 0, 0] and FALSE (0) elsewhere.

Combine

The simplest way to do a binary switch is with ifthenelse. This takes a condition image and uses that to pick pixels from either a true image or a false image.

In your case, you can write:

result = mask.ifthenelse(overlay, base)

Complete program

Putting these together, you could write:

#!/usr/bin/env python3

import sys
import pyvips

if len(sys.argv) != 4:
    print(f"usage: {sys.argv[0]} base-image overlay-image output-image")
    sys.exit(1)

base = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
overlay = pyvips.Image.new_from_file(sys.argv[2], access="sequential")

# TRUE for non-zero pixels in the overlay
mask = (overlay != 0).bandor()

# use that to choose between pixels in base and overlay
result = mask.ifthenelse(overlay, base)

result.write_to_file(sys.argv[3])

I can run it like this:

$ VIPS_PROGRESS=1 ./overlay.py plane_0.png plane_1.png x.png
overlay.py temp-13: 13056 x 45568 pixels, 32 threads, 128 x 128 tiles, 640 lines in buffer
overlay.py temp-13: 39% complete
...

And it makes this:

jcupitt commented 5 months ago

There are a few things you could try to make writing these things less annoying.

An easy one is to install vipsdisp (a libvips image viewer) from flathub and use it to look at intermediate images.

https://flathub.org/apps/org.libvips.vipsdisp

Docs in the README:

https://github.com/jcupitt/vipsdisp

It will display images as libvips understands them if you save as .v. You could write something like:

def show(image):
    image.write_to_file("temp.v")
    os.system("vd temp.v")

I have vd as an alias for vipsdisp. Now you can add a show() any time you want to pause evaluation and examine an intermediate result. If you turn on the info bar (view / info) you can see individual pixel values in the status bar at the bottom. If you turn on the display control bar (view / display control) you can change the scale and offset for display, so you can see detail in float images.

You could also consider nip2, the very eccentric libvips GUI:

https://github.com/libvips/nip2

You can make these things interactively -- you enter lines of code and you can watch pixels changing in every intermediate image as you type.

It should be in the package repo on all linuxes, and there's a win binary too. You can make quite large and complex workspaces, save the workspace, then run nip2 in batch mode from the command-line over a large directory of files.

It has its own programming language, so there's a learning curve :( Though you can drive it from menus as well (just like excel).

I made it 25 years ago haha, so it looks very old-fashioned. I'm modernising the UI right now, so there should be a prettier version in a few months:

https://github.com/jcupitt/nip4

TWCCarlson commented 5 months ago

Wow! Thanks for the thorough explanation and bonus tips! This makes sense and seems easy enough to implement. Using the same idea, I tried to also implement a threshold under which the mask is TRUE, similar to the way dzsave works. Only, it's much slower than your bandor() implementation:

bands = overlay.bandsplit()
mask = 255 * (abs(bands[0]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE or 
            abs(bands[1]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE or 
            abs(bands[2]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE)
compositeImage = mask.ifthenelse(overlay, base)
compositeImage.write_to_file("debug/plane_1.png")
# Runtime: 18.94831085205078

mask = (overlay != 0).bandor()
compositeImage = mask.ifthenelse(overlay, base)
compositeImage.write_to_file("debug/plane_1.png")
# 7.408785581588745

This is probably expected due to the way I've gone about it... is there a better way? I couldn't figure out a clever expression to do so with bandor() that doesn't involve splitting the bands.

TWCCarlson commented 5 months ago

Also wanted to say that this is a very cool (and awesomely performant) ecosystem you've built up over time. Functional UIs are the best ones anyhow :)

jcupitt commented 5 months ago

You don't need to split bands -- [] is overloaded to mean band-extract. You can write eg.:

rgb = rgba[0:3]
a = rgba[3]

etc.

If you do:

mask = rgb > 10

You get a three band uchar image with 255 if that band element is greater than 10. This works for all the arithmetic, boolean and relational operators. You can also write:

mask = rgb > [10, 20, 30]

and mask[0] will be 255 for band-elements greater than 10, mask[1] for band-elements > 20, etc. Again, you can do this for any operator.

For your mask generation you could write:

transparency = 20
tolerance = 5
mask = (abs(image - transparency) > tolerance).bandor().ifthenelse(overlay, base)

Though probably just:

mask = (image > 10).bandor().ifthenelse(overlay, base)

would be OK.

TWCCarlson commented 5 months ago

Ah, I understand the overloading now. Thank you for the very thorough explanation, I've been able to get everything working.

libvips / pyvips