Closed TWCCarlson closed 5 months ago
I've noticed that the datatype for each image is different. When I first load them up:
<pyvips.Image 13056x45568 uchar, 4 bands, srgb>
After calculating the color distance:
<pyvips.Image 13056x45568 float, 1 bands, srgb>
It remains a float for the rest of the runtime. Could the interaction between uchar (nump detects uint8) and float (numpy detects float32) be the issue?
Hi @TWCCarlson,
Yes, all libvips operations are value-preserving, ie. outputs are big enough to hold the whole output range. uchar + uchar -> ushort, for example.
I'll make you a tiny demo prog.
There are lots of ways of doing this, but I think the most straightforward is probably by making a boolean mask image and using ifthenelse
.
Make a mask image that's TRUE for the non-zero pixels in the overlay (plane_1
).
overlay != 0
will be a three-band image with 255 (non-zero, or TRUE) in band 0 (red) when band 0 is not zero.
For a mask (or alpha) you want to find pixels where every band is non-zero, so you need to AND all the bands together with bandand
.
Therefore:
mask = (overlay != 0).bandor()
Now mask
is a one-band uchar image which will be TRUE (255) for all pixels which are equal to [0, 0, 0]
and FALSE (0) elsewhere.
The simplest way to do a binary switch is with ifthenelse
. This takes a condition image and uses that to pick pixels from either a true image or a false image.
In your case, you can write:
result = mask.ifthenelse(overlay, base)
Putting these together, you could write:
#!/usr/bin/env python3
import sys
import pyvips
if len(sys.argv) != 4:
print(f"usage: {sys.argv[0]} base-image overlay-image output-image")
sys.exit(1)
base = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
overlay = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
# TRUE for non-zero pixels in the overlay
mask = (overlay != 0).bandor()
# use that to choose between pixels in base and overlay
result = mask.ifthenelse(overlay, base)
result.write_to_file(sys.argv[3])
I can run it like this:
$ VIPS_PROGRESS=1 ./overlay.py plane_0.png plane_1.png x.png
overlay.py temp-13: 13056 x 45568 pixels, 32 threads, 128 x 128 tiles, 640 lines in buffer
overlay.py temp-13: 39% complete
...
And it makes this:
There are a few things you could try to make writing these things less annoying.
An easy one is to install vipsdisp (a libvips image viewer) from flathub and use it to look at intermediate images.
https://flathub.org/apps/org.libvips.vipsdisp
Docs in the README:
https://github.com/jcupitt/vipsdisp
It will display images as libvips understands them if you save as .v
. You could write something like:
def show(image):
image.write_to_file("temp.v")
os.system("vd temp.v")
I have vd
as an alias for vipsdisp
. Now you can add a show()
any time you want to pause evaluation and examine an intermediate result. If you turn on the info bar (view / info) you can see individual pixel values in the status bar at the bottom. If you turn on the display control bar (view / display control) you can change the scale and offset for display, so you can see detail in float images.
You could also consider nip2, the very eccentric libvips GUI:
https://github.com/libvips/nip2
You can make these things interactively -- you enter lines of code and you can watch pixels changing in every intermediate image as you type.
It should be in the package repo on all linuxes, and there's a win binary too. You can make quite large and complex workspaces, save the workspace, then run nip2 in batch mode from the command-line over a large directory of files.
It has its own programming language, so there's a learning curve :( Though you can drive it from menus as well (just like excel).
I made it 25 years ago haha, so it looks very old-fashioned. I'm modernising the UI right now, so there should be a prettier version in a few months:
Wow! Thanks for the thorough explanation and bonus tips! This makes sense and seems easy enough to implement. Using the same idea, I tried to also implement a threshold under which the mask is TRUE, similar to the way dzsave
works. Only, it's much slower than your bandor()
implementation:
bands = overlay.bandsplit()
mask = 255 * (abs(bands[0]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE or
abs(bands[1]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE or
abs(bands[2]-TRANSPARENCY_COLOR) > TRANSPARENCY_TOLERANCE)
compositeImage = mask.ifthenelse(overlay, base)
compositeImage.write_to_file("debug/plane_1.png")
# Runtime: 18.94831085205078
mask = (overlay != 0).bandor()
compositeImage = mask.ifthenelse(overlay, base)
compositeImage.write_to_file("debug/plane_1.png")
# 7.408785581588745
This is probably expected due to the way I've gone about it... is there a better way? I couldn't figure out a clever expression to do so with bandor()
that doesn't involve splitting the bands.
Also wanted to say that this is a very cool (and awesomely performant) ecosystem you've built up over time. Functional UIs are the best ones anyhow :)
You don't need to split bands -- []
is overloaded to mean band-extract. You can write eg.:
rgb = rgba[0:3]
a = rgba[3]
etc.
If you do:
mask = rgb > 10
You get a three band uchar image with 255 if that band element is greater than 10. This works for all the arithmetic, boolean and relational operators. You can also write:
mask = rgb > [10, 20, 30]
and mask[0]
will be 255 for band-elements greater than 10, mask[1]
for band-elements > 20, etc. Again, you can do this for any operator.
For your mask generation you could write:
transparency = 20
tolerance = 5
mask = (abs(image - transparency) > tolerance).bandor().ifthenelse(overlay, base)
Though probably just:
mask = (image > 10).bandor().ifthenelse(overlay, base)
would be OK.
Ah, I understand the overloading now. Thank you for the very thorough explanation, I've been able to get everything working.
Hello again,
I am attempting to manipulate two images
plane_0
andplane_1
such that I:plane_1
, replacing with transparencyplane_1
atopplane_0
I've done what was recommended here to strip away the black pixels, and I get an image which looks like what I expect (this is a subsection of the image approximately 80% of the way down and 25% across; the checkboard is how GIMP is representing the transparency):
So I think that is working well. However, when I execute the composite I get an output like this:
Which, while kind of retro-cool-looking, isn't right. I expected the pixels which are not transparent on
plane_1
to fully replace those onplane_0
per the Wikipedia entry on alpha compositing as I gave all pixels an alpha value of 255.Looking at the colors that end up being rendered I notice that most of them are 255 in one or two bands. Some other frequent values are 72 and 0. This makes me think the problem could lie in some of the math being done, perhaps some kind of value capping.
Have I approached this incorrectly?
Here are the images: https://drive.google.com/drive/folders/11N3relUy4yhual9H3h5WUv7sbUsXV0Qq?usp=drive_link