libvips / pyvips

python binding for libvips using cffi
MIT License
631 stars 49 forks source link

Issues converting an arrayjoin to oem-tiff format #413

Open euzada opened 1 year ago

euzada commented 1 year ago

edit: Problem solved by removing the second arrayjoin im = pyvips.Image.arrayjoin(im.bandsplit(), across=1)

Hi,

I was using the convert.py code written by @jcupitt, and it works if the image is on the disk, but if I included the code after an arrayjoin it saves a max file of 2.56GB, and it stops. The file should be around 3.47GB.

I am trying to figure out if I am hitting the 2048 file limit on Windows. I am stitching more than 3000 images.

Here is the code I am using:

#load a local image already stitched using arrayjoin method
im = pyvips.Image.new_from_file(sys.argv[1])

if im.hasalpha():
    im = im[:-1]

image_height = im.height
image_bands = im.bands

# split to separate image planes and stack vertically ready for OME 
im = pyvips.Image.arrayjoin(im.bandsplit(), across=1)

# set minimal OME metadata
# before we can modify an image (set metadata in this case), we must take a 
# private copy
im = im.copy()
im.set_type(pyvips.GValue.gint_type, "page-height", image_height)
im.set_type(pyvips.GValue.gstr_type, "image-description",
f"""<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
    <Image ID="Image:0">
        <!-- Minimum required fields about image dimensions -->
        <Pixels DimensionOrder="XYCZT"
                ID="Pixels:0"
                SizeC="{image_bands}"
                SizeT="1"
                SizeX="{im.width}"
                SizeY="{image_height}"
                SizeZ="1"
                Type="uint8">
        </Pixels>
    </Image>
</OME>""")

im.tiffsave(sys.argv[2], compression="jpeg", tile=True,
            tile_width=512, tile_height=512,
            pyramid=True, subifd=True)

That code only works as I mentioned if I use it to first load an image saved on the drive i.e. result.v and successfully convert it to tiff file that I can inspect with QuPath.

But this code doesn't save the file completely, it looks like there is some limitation.

tiles_across = 49
tiles = []

# Sort files based on the [y,x] values
def sort_by_yx(filename):
    match = re.search(r'\[(\d+),(\d+)\].*\.png', filename)
    if match:
        y_value = int(match.group(1))
        x_value = int(match.group(2))
        return y_value, x_value
    else:
        return float('inf'), float('inf')
directory = sys.argv[1]

files = os.listdir(directory)
sorted_files = sorted(files, key=sort_by_yx)

tile_size = None  # Variable to store the tile size
tiles = []
for filename in sorted_files:
    if os.path.isfile(os.path.join(directory, filename)):
        tile = pyvips.Image.new_from_file(os.path.join(directory, filename), access="sequential")
        tile = tile.crop(overlap, overlap, tile.width - overlap, tile.height - overlap)
        tiles.append(tile)

im = pyvips.Image.arrayjoin(tiles, across=(tiles_across))

if im.hasalpha():
    im = im[:-1]

image_height = im.height
image_bands = im.bands

# split to separate image planes and stack vertically ready for OME 
_**# it works if I delete this line. The file can be opened in QuPath with no problem**_
**im = pyvips.Image.arrayjoin(im.bandsplit(), across=1)**

im = im.copy()
im.set_type(pyvips.GValue.gint_type, "page-height", image_height)
im.set_type(pyvips.GValue.gstr_type, "image-description",
f"""<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
    <Image ID="Image:0">
        <!-- Minimum required fields about image dimensions -->
        <Pixels DimensionOrder="XYCZT"
                ID="Pixels:0"
                SizeC="{image_bands}"
                SizeT="1"
                SizeX="{im.width}"
                SizeY="{image_height}"
                SizeZ="1"
                Type="uint8">
        </Pixels>
    </Image>
</OME>""")

im.tiffsave(sys.argv[2], compression="jpeg", tile=True,
            tile_width=512, tile_height=512,
            pyramid=True, subifd=True)

# The save will stop after 2.56 GB with no error, but not complete.

Would you happen to have any idea how to do it? Instead of saving the image first, I prefer to convert it to OEM-TIFF than save it.

jcupitt commented 1 year ago

Hi @euzada,

Try removing the access="sequential". You are splitting the images to separate RGB planes, then joining them vertically, so each input image needs to be scanned three times. With sequential mode you can only scan each input once.

I made a version of your program and this seems to work:

#!/usr/bin/env python

# make test data with eg.
# for x in {0..99}; do for y in {0..99}; do cp ~/pics/k2.jpg [$x,$y].png; done; done

import os
import re
import sys

import pyvips

if len(sys.argv) != 3:
    print("using: join-to-ome-tiff.py image-directory output-tiff")
    sys.exit(1)

# scan the directory and get the max x and y
print(f"scanning directory {sys.argv[1]}/ ...")
max_x = 0
max_y = 0
for filename in os.listdir(sys.argv[1]):
    match = re.match(r'\[(\d+),(\d+)\].*\.png', filename)
    if match:
        max_x = max(max_x, int(match.group(1)))
        max_y = max(max_y, int(match.group(2)))

print(f"loading {max_x + 1} x {max_y + 1} tiles ...")
tiles = []
overlap = 2
for y in range(max_x + 1):
    for x in range(max_y + 1):
        tile = pyvips.Image.new_from_file(f"{sys.argv[1]}/[{x},{y}].png")
        tile = tile.crop(0, 0, tile.width - overlap, tile.height - overlap)
        tiles.append(tile)

im = pyvips.Image.arrayjoin(tiles, across=max_x + 1)

if im.hasalpha():
    im = im[:-1]

image_width = im.width
image_height = im.height
image_bands = im.bands

# split to separate image planes and stack vertically ready for OME
im = pyvips.Image.arrayjoin(im.bandsplit(), across=1)

# set minimal OME metadata
# before we can modify an image (set metadata in this case), we must take a
# private copy
# Even with a copy, still not working.
im = im.copy()
im.set_type(pyvips.GValue.gint_type, "page-height", image_height)
im.set_type(pyvips.GValue.gstr_type, "image-description",
f"""<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
    <Image ID="Image:0">
        <!-- Minimum required fields about image dimensions -->
        <Pixels DimensionOrder="XYCZT"
                ID="Pixels:0"
                SizeC="{image_bands}"
                SizeT="1"
                SizeX="{im.width}"
                SizeY="{image_height}"
                SizeZ="1"
                Type="uint8">
        </Pixels>
    </Image>
</OME>""")

print(f"saving image of {image_width} x {image_height} pixels ...")
im.tiffsave(sys.argv[2],
            bigtiff=True, pyramid=True, subifd=True,
            compression="jpeg", tile=True,
            tile_width=512, tile_height=512)

However, this will load every image to memory :( I tried with 10,000 1500x2048 pixel PNGs and I see:

$ for x in {0..99}; do for y in {0..99}; do cp ~/pics/k2.png sample/[$x,$y].png; done; done
$ /usr/bin/time -f %M:%e ./join-to-ome-tiff.py sample x.tif
scanning directory sample/ ...
loading 100 x 100 tiles ...
saving image of 144800 x 204600 pixels ...
94284024:1649.32

So it look 1600s (30m) and needed 94gb of memory (!!!).

To get memory use down, you can load via disc instead:

$ VIPS_DISC_THRESHOLD=0 /usr/bin/time -f %M:%e ./join-to-ome-tiff.py sample x.tif
scanning directory sample/ ...
loading 100 x 100 tiles ...
saving image of 144800 x 204600 pixels ...
57650716:10635.51

Though it's still pretty high (50gb) and now takes forever to run.

jcupitt commented 1 year ago

... I would assemble your tiles to a regular tiled tiff, then convert that file to ome-tiff in a second pass.

euzada commented 1 year ago

... I would assemble your tiles to a regular tiled tiff, then convert that file to ome-tiff in a second pass.

Hi @jcupitt, do you mean to save the results of im = pyvips.Image.arrayjoin(tiles, across=max_x + 1) as tiffsave(sys.argv[2], bigtiff=True, pyramid=True, compression="jpeg", tile=True, tile_width=512, tile_height=512) than load it again and convert it to oem-tiff? is that will help with the compression to reduce the time? I see that will take probably double the time to save it first as tiff and convert the tiff as oem-tiff. Also, I tried first to save the resukts as "%s.v" in new_temp_file and write, but as soon as the write finished the and file close it deleted directly and cannot reuse it to convert it to oem-tiff. If I use write_to_file instead, than I can reuse it but the file will not be deleted after from the Temp directly on windows and take almost 285GB on the hard drive.

Do you have any suggestion how can I use write_temp_file and reuse the results to create the oem-tiff and delete the temp file as soon as the oem-tiff is completed? if it work I can save it as tiled tiff and try it instead. Thank your for your help.

jcupitt commented 1 year ago

You don't need to save as a pyramidal tiff. Something like:

im = pyvips.Image.arrayjoin(tiles, across=max_x + 1)
im.tiffsave("my-tmpfile.tif", bigtiff=True, compression="jpeg", tile=True, tile_width=512, tile_height=512)

Then convert to ome-tiff in a second pass..