jonasteuwen commented 1 month ago

Problem

Currently pyvips only supports reading ICC profiles from a file as far as I can see. OpenSlide gives an io.BytesIO output. I have modified openslide-python to output pyvips.Image.

Code example

So right now you can do this:

import openslide.lowlevel as openslide_lowlevel
import os

owsi = openslide_lowlevel.open(str(filename))
profile = openslide_lowlevel.read_icc_profile(owsi)
color_profile = io.BytesIO(profile)

With PIL you can now do this:

pil_region = wsi.read_region(coordinates, level, size)
to_profile = PIL.ImageCms.createProfile("sRGB")
intent = PIL.ImageCms.getDefaultIntent(color_profile)
color_transform = PIL.ImageCms.buildTransform(color_profile, to_profile, "RGBA", "RGBA", intent, 0)
PIL.ImageCms.applyTransform(pil_region, color_transform, inPlace=True)

This does not seem to be possible with pyvips and I need to dump color_profile to disk?

jcupitt commented 1 month ago

Hi @jonasteuwen,

pyvips lets you fetch any libvips metadata with get. For example:

image = pyvips.Image.new_from_file("CMU-1.svs")
profile = image.get("icc-profile-data")

You can see all the metadata that libvips can read for a file with vipsheader, for example:

$ vipsheader -a CMU-1.svs | grep icc
openslide.icc-size: 141992
icc-profile-data: 141992 bytes of binary data

The icc_transform operation in pyvips can pick up the metadata profile, so you could write:

image = pyvips.Image.new_from_file("CMU-1.svs")
srgb = image.icc_transform("srgb")

And it'll combine the slide profile with a standard srgb profile to generate a corrected sRGB image.

openslide makes RGBA images by default, though the A is almost always just 255. If you pass the rgb option to new_from_file it'll read plain RGB instead, which can give a very useful speedup.

image = pyvips.Image.new_from_file("CMU-1.svs", rgb=True)

jcupitt commented 1 month ago

Ah you want to just fetch and process a small region, is that right? You could write:

image = pyvips.Image.new_from_file("CMU-1.svs", rgb=True).icc_transform("srgb")
for y in range(0, image.height, 256):
    for x in range(0, image.width, 256):
        tile = image.crop(x, y, min(256, image.width - x), min(256, image.height - y))
        rgb_pixel_array = tile.numpy()
        do_something_with_the_tile_data(rgb_pixel_array)

libvips is threaded and demand-driven, so it'll be efficient.

jonasteuwen commented 1 month ago

Hi @jcupitt,

Thank you for your prompt reply. In my code, I have two backends: pyvips directly, which will work as you do (thanks for the example, that's much more efficient!), and a fork of openside-python where instead of outputting it to a PIL Image, pass it to a pyvips image. See here:

https://github.com/NKI-AI/dlup/blob/feature/libvips/dlup/backends/openslide_backend.py https://github.com/NKI-AI/dlup/blob/feature/libvips/dlup/experimental_backends/pyvips_backend.py.

When using the openslide C library, you can get the icc profile as BytesIO stream as shown above, and I want to use those to create an icc_transform that I want to apply to your rgb_pixel_array.

I would imagine something like this:

owsi = openslide_lowlevel.open(str(filename))
profile = openslide_lowlevel.read_icc_profile(owsi)
color_profile = io.BytesIO(profile)

for y in range(0, image.height, 256):
    for x in range(0, image.width, 256):
        tile = owsi.read_region((x, y), level, (min(256, image.width - x), min(256, image.height - y))).icc_transform("srgb", input_profile=color_profile)
        rgb_pixel_array = tile.numpy()
        do_something_with_the_tile_data(rgb_pixel_array)

Note that I modified the .read_region() of the openslide library to output a pyvips.Image.

OpenSlide attaches it to the PIL image when reading the region: https://github.com/openslide/openslide-python/blob/22978715366db4ef1a3ebaab49c514131617fe66/openslide/__init__.py#L255

Can we do the same using this profile BytesIO?

jcupitt commented 1 month ago

You can attach the profile from openslide_lowlevel as metadata to the pyvips image. Something like (untested):

owsi = openslide_lowlevel.open(str(filename))
profile = openslide_lowlevel.read_icc_profile(owsi)
color_profile = io.BytesIO(profile).read()

tile = owsi.read_region((x, y), level, (min(256, image.width - x), min(256, image.height - y)))
# attach profile to image as metadata
tile.set_type(pyvips.GValue.blob_type, "icc-profile-data", color_profile)
tile = tile.icc_transform("srgb")

Though performance might not be that great -- image = pyvips.Image.new_from_file("CMU-1.svs", rgb=True).icc_transform("srgb") will probably be a lot quicker (but I've not benchmarked it).

Why do you need two backends?

jonasteuwen commented 1 month ago

@jcupitt Thank you! I will give it a try!

Different backends: I found there are some minor differences between how pyvips reads the images and openslide reads them (one of them the output being RGB/RGBA or so) and maybe some interpolation. I don't know why, the ssim > 0.999 but np.allclose(a,b) is not true. While I use pyvips for new projects, I wanted to make sure that our older projects based on openslide remain producing the same outputs for the same data when they update the library.

libvips / pyvips

ICC profiles from file stream in pyvips #475

Problem

Code example