libvips / pyvips

python binding for libvips using cffi
MIT License
649 stars 50 forks source link

Issues with writing to SVS files #480

Open superli6 opened 4 months ago

superli6 commented 4 months ago

When I convert a pyvips object to numpy, I encounter a memory overflow issue. After I chunk the image and write it back to the same SVS file, the SVS file becomes unusable. Is there a problem with my logic?

# Create TiffWriter object
with tifffile.TiffWriter(out_file, bigtiff=True) as tif:
    # Iterate through each row of the image
    for y in range(n_down):
        print("row {} ...".format(y))  # Print the current row being processed

        # Iterate through each column of the image
        for x in range(n_across):
            # Crop a patch of size patch_size x patch_size from the original image
            # Calculate the top-left coordinates of the patch
            patch_x = x * patch_size
            patch_y = y * patch_size

            # Check if the bottom-right coordinates of the patch exceed the image boundaries
            if patch_x + patch_size > image_py.width:
                patch_x = image_py.width - patch_size  # Limit the top-left coordinates within image boundaries

            if patch_y + patch_size > image_py.height:
                patch_y = image_py.height - patch_size  # Limit the top-left coordinates within image boundaries

            # Crop a patch of size patch_size x patch_size from the original image
            patch = image_py.crop(patch_x, patch_y, patch_size, patch_size)
            image = np.array(patch)[..., 0:3]

            width, height = image.shape[0:2]
            # Required pyramid resolutions
            multi_hw = np.int64([(width, height), (width // 2, height // 2),
                                 (width // 4, height // 4), (width // 8, height // 8),
                                 (width // 16, height // 16),
                                 (width // 32, height // 32),
                                 (width // 64, height // 64)])

            thw = tile_hw.tolist()
            # outcolorspace should remain as default YCbCr, not rgb, otherwise colors will be abnormal
            # 95 is the default JPEG quality, range is 0-100, the higher the value, the closer to lossless
            compression = ['JPEG', 95]
            kwargs = dict(subifds=0, photometric='rgb', planarconfig='CONTIG', compression=compression,
                          dtype=np.uint8,
                          metadata=None)

            for i, hw in enumerate(multi_hw):
                hw = up_to16_manifi(hw)
                temp_wsi = cv2.resize(image, (hw[1], hw[0]))
                new_x, new_y = up_to16_manifi(hw)
                new_wsi = np.ones((new_x, new_y, 3), dtype=np.uint8) * 255
                new_wsi[0:hw[0], 0:hw[1], :] = temp_wsi[..., 0:3]
                index = gen_patches_index((new_x, new_y), img_size=TILE_SIZE, stride=TILE_SIZE)
                gen = gen_im(new_wsi, index)

                if i == 0:
                    desc = svs_desc.format(mag=mag, filename=filename, mpp=mpp)
                    tif.write(data=gen, shape=(*hw, 3), tile=thw[::-1], resolution=resolution, description=desc,
                              **kwargs)
                    _hw = up_to16_manifi(multi_hw[-2])
                    thumbnail_im = cv2.resize(image, (_hw[1], _hw[0]))[..., 0:3]
                    tif.write(data=thumbnail_im, description='', **kwargs)
                else:
                    tif.write(data=gen, shape=(*hw, 3), tile=thw[::-1], resolution=resolution, description='',
                              **kwargs)
            _hw = up_to16_manifi(multi_hw[-2])
            macro_im = cv2.resize(image, (_hw[1], _hw[0]))[..., 0:3]
            tif.write(data=macro_im, subfiletype=9,
                      description=macro_desc.format(W=macro_im.shape[1], H=macro_im.shape[0]), **kwargs)
jcupitt commented 4 months ago

Sorry, I've never tried to write SVS.

It's a pretty non-standard and proprietary TIFF though, so I think you'll have a range problems, especially around metadata. Do you have to use SVS? One of the standard slide image formats, like DICOM or plain TIFF, or even OME-TIFF, would probably suit you better.

From a quick look at your code, SVS is row major, so you'll need to transpose the tiles. SVS has very few pyramid levels, you'll need to remove most of yours. Your loop looks confusing, is there an indentation problem? Or is this two separate pieces of code? You'll need quite a bit of code to generate the correct IMAGEDESCRIPTION as well.

I'd read the SVS loader in openslide and make sure you're following that layout.

https://github.com/openslide/openslide/blob/main/src/openslide-vendor-aperio.c

superli6 commented 4 months ago

对不起,我从未尝试过编写 SVS。

不过,这是一个非常非标准和专有的 TIFF,所以我认为你会遇到一系列问题,尤其是在元数据方面。您必须使用 SVS 吗?其中一种标准幻灯片图像格式,如 DICOM 或普通 TIFF,甚至 OME-TIFF,可能更适合您。

快速浏览一下代码,SVS 是行主的,因此需要转置磁贴。SVS 的金字塔级别很少,您需要删除大部分金字塔级别。您的循环看起来很混乱,是否存在缩进问题?或者这是两段独立的代码?您还需要相当多的代码来生成正确的代码。IMAGEDESCRIPTION

我会在 openslide 中阅读 SVS 加载程序,并确保您遵循该布局。

https://github.com/openslide/openslide/blob/main/src/openslide-vendor-aperio.c

Thank you for your answer. As a beginner, my code is quite messy. Because I encountered a memory overflow issue with numpy arrays, I wanted to try first chunking the image, converting each chunk to a numpy array, and then writing them one by one into an SVS file. I used two nested for loops to chunk the image and then placed the code to write to the SVS file inside the loop. The code did not produce any errors, but when I try to open the SVS file with software, it is not usable. I cannot find the problem. Here is the code without the chunking process。 `def gen_pyramid_tiff(in_file, out_file, select_level=2): ''' select_level determines which layer to read. Level 0 is the highest magnification. ''' svs_desc = 'Aperio Image Library Fake\nABC |AppMag = {mag}|Filename = {filename}|MPP = {mpp}' label_desc = 'Aperio Image Library Fake\nlabel {W}x{H}' macro_desc = 'Aperio Image Library Fake\nmacro {W}x{H}'

Specify mpp value

odata = openslide.open_slide(in_file)
# Get the current image's MPP
# If it's not this field, find out which one it is
mpp = float(odata.properties['mirax.LAYER_0_LEVEL_0_SECTION.MICROMETER_PER_PIXEL_X'])  # 0.5
# Specify magnification
mag = 40
# mpp around 0.25 is 40X, around 0.5 is 20X
if mpp <= 0.3:
    mag = 20
    mpp = mpp * 2
# Convert mpp value to resolution
resolution = [10000 / mpp, 10000 / mpp, 'CENTIMETER']

# Specify image name
if odata.properties.get('aperio.Filename') is not None:
    filename = odata.properties['aperio.Filename']
else:
    filename = get_name_from_path(in_file)

print(f"loading '{in_file}'")
start = time.time()
# image_py = openslide.open_slide(in_file)
# pyvip is faster than openslide, but if not available, openslide is also fine
image_py = pyvips.Image.openslideload(in_file, level=select_level)
image = np.array(image_py)[..., 0:3]
print(f"finish loading '{in_file}'. costing time:{time.time() - start}")
# image = np.array(image_py['20X'])
# Thumbnail
thumbnail_im = np.zeros([762, 762, 3], dtype=np.uint8)
thumbnail_im = cv2.putText(thumbnail_im, 'thumbnail', (thumbnail_im.shape[1] // 4, thumbnail_im.shape[0] // 2),
                           cv2.FONT_HERSHEY_PLAIN, 6, color=(255, 0, 0), thickness=3)
# Label image
label_im = np.zeros([762, 762, 3], dtype=np.uint8)
label_im = cv2.putText(label_im, 'label', (label_im.shape[1] // 4, label_im.shape[0] // 2), cv2.FONT_HERSHEY_PLAIN,
                       6, color=(0, 255, 0), thickness=3)
# Macro image
macro_im = np.zeros([762, 762, 3], dtype=np.uint8)
macro_im = cv2.putText(macro_im, 'macro', (macro_im.shape[1] // 4, macro_im.shape[0] // 2), cv2.FONT_HERSHEY_PLAIN,
                       6, color=(0, 0, 255), thickness=3)

# Tile size
tile_hw = np.int64([TILE_SIZE, TILE_SIZE])

width, height = image.shape[0:2]
# Required pyramid resolutions
multi_hw = np.int64([(width, height), (width // 2, height // 2),
                     (width // 4, height // 4), (width // 8, height // 8),
                     (width // 16, height // 16),
                     (width // 32, height // 32),
                     (width // 64, height // 64)])

# Try writing in svs format
with tifffile.TiffWriter(out_file, bigtiff=True) as tif:
    thw = tile_hw.tolist()
    # outcolorspace should remain as default YCbCr, not rgb, otherwise colors will be abnormal
    # 95 is the default JPEG quality, range is 0-100, the higher the value, the closer to lossless
    compression = ['JPEG', 95]
    kwargs = dict(subifds=0, photometric='rgb', planarconfig='CONTIG', compression=compression, dtype=np.uint8,
                  metadata=None)

    for i, hw in enumerate(multi_hw):
        hw = up_to16_manifi(hw)
        temp_wsi = cv2.resize(image, (hw[1], hw[0]))
        new_x, new_y = up_to16_manifi(hw)
        new_wsi = np.ones((new_x, new_y, 3), dtype=np.uint8) * 255
        new_wsi[0:hw[0], 0:hw[1], :] = temp_wsi[..., 0:3]
        index = gen_patches_index((new_x, new_y), img_size=TILE_SIZE, stride=TILE_SIZE)
        gen = gen_im(new_wsi, index)

        if i == 0:
            desc = svs_desc.format(mag=mag, filename=filename, mpp=mpp)
            # tif.write(data=gen, resolution=resolution, description=desc, **kwargs)
            tif.write(data=gen, shape=(*hw, 3), tile=thw[::-1], resolution=resolution, description=desc, **kwargs)
            _hw = up_to16_manifi(multi_hw[-2])
            thumbnail_im = cv2.resize(image, (_hw[1], _hw[0]))[..., 0:3]
            tif.write(data=thumbnail_im, description='', **kwargs)
        else:

            tif.write(data=gen, shape=(*hw, 3), tile=thw[::-1], resolution=resolution, description='', **kwargs)
    _hw = up_to16_manifi(multi_hw[-2])
    macro_im = cv2.resize(image, (_hw[1], _hw[0]))[..., 0:3]
    tif.write(data=macro_im, subfiletype=9, description=macro_desc.format(W=macro_im.shape[1], H=macro_im.shape[0]),
              **kwargs)`
jcupitt commented 4 months ago

Writing SVS is difficult, and even if you do get it working, it will be extremely slow. A fast, high-quality SVS writer will take many weeks of effort.

Are you certain you have to write SVS? Why not use OME-TIFF or DICOM or plain TIFF?

William859662 commented 4 months ago

As you know unfortunately ome.tif is not supported by openslide, so it may be necessary to write svs. Svs is supported in many viewers for metadata such as magnification values. Even if openslide supports ome.tif, I think it might take some time to make it active in all viewers.

There is a code to write svs with the tifffilelibrary in this post,

I tried it, it writes svs and the metadata in the file is supported by openslide, for example I was able to read the "label" or "macro" data from this svs file using openslide, but unfortunately tifffile reads the whole file by loading it into ram and this is a bit of a problem

jcupitt commented 4 months ago

Ah, nice! But as you say, doing this efficiently is quite a bit more difficult.

William859662 commented 4 months ago

Yes I agree with you, it will be difficult to write completely, maybe by changing the IFD and description structure, the tif file can be transformed to svs, but I'm not sure, what do you think ?

jcupitt commented 4 months ago

I would use pyvips to fetch single tiles from the source image, then write each one back with tiffile. But it will probably not be quick :(