Unexpected behaviour adjust_color_range

rharkes commented 2 years ago

When writing a large dataset with float datatype ranging from 0 to 1 I found that setting adjust_color_range to True resulted in very strange values for colorrange Histmin and Histmax:

colorrange = ['-0.651', '14.000']
Histmin = -100.000 ; Histmax = 56.643
Datmin = 0.0 ; Datmax = 1.0

The dataset is very big, and I'm trying to make a minimal reproducible example. When using PyImarisWriterNumpyExample2.ims from PyImarisWriterNumpyExample.py The script at the end of this issue results in:

Channel attributes: ['ColorMode', 'ColorOpacity', 'ColorRange', 'ColorTable', 'ColorTableLength', 'Description', 'GammaCorrection', 'Name']
colorrange = ['9.528', '239.000']
Histmin = 9.528 ; Histmax = 239.597
Datmin = 10.0 ; Datmax = 240.0

(edit) I didn't crop the data the first time, so saw Datmin=0.0, but that was my mistake in reading the data. I do miss Min and Max in the list of attributes. It would be nice to know what attributes/groups/datasets are required for a valid .ims file. Is this documented somewhere?

import h5py

pth_in = r'PyImarisWriterNumpyExample2.ims'
f = h5py.File(pth_in)
reslevels = list(f['DataSet'].keys())
timepoints = list(f['DataSet'][reslevels[0]].keys())
channels = list(f['DataSet'][reslevels[0]][timepoints[0]].keys())
dsetInfo = f['DataSetInfo'][channels[0]]
print(f"Channel attributes: {list(dsetInfo.attrs.keys())}")
cr = dsetInfo.attrs.get('ColorRange').tobytes().decode()
cr = cr.split()
print(f"colorrange = {cr}")

datattrs = list(f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.keys())
histmax = f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.get('HistogramMax').tobytes().decode()
histmin = f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.get('HistogramMin').tobytes().decode()
print(f"Histmin = {histmin} ; Histmax = {histmax}")
xSz = int(f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.get('ImageSizeX').tobytes().decode())
ySz = int(f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.get('ImageSizeY').tobytes().decode())
zSz = int(f['DataSet'][reslevels[0]][timepoints[0]][channels[0]].attrs.get('ImageSizeZ').tobytes().decode())
dat = dat[0:zSz, 0:ySz, 0:xSz]
print(f"Datmin = {dat.min()} ; Datmax = {dat.max()}")

rharkes commented 2 years ago

Hopefully this is a reproducible example. The script at the end of this post creates a single file of 609MB with values ranging from 0 to 1. When I run the script from the previous post it outputs:

colorrange = ['-0.041', '15.000']
Histmin = -100.000 ; Histmax = 56.643
Datmin = 0.0 ; Datmax = 0.9999999403953552

Can others reproduce this?

from datetime import datetime
from numpy.random import default_rng
from PyImarisWriter import PyImarisWriter as PW

output_filename = r'C:\Temp\ImarisTest.ims'
rng = default_rng(12345)
np_data = rng.random((2**14, 2**13), dtype='float32')

class MyCallbackClass(PW.CallbackClass):
    def __init__(self):
        self.mUserDataProgress = 0

    def RecordProgress(self, progress, total_bytes_written):
        progress100 = int(progress * 100)
        if progress100 - self.mUserDataProgress >= 5:
            self.mUserDataProgress = progress100
            print('User Progress {}, Bytes written: {}'.format(self.mUserDataProgress, total_bytes_written))

image_size = PW.ImageSize(x=np_data.shape[0], y=np_data.shape[1], z=1, c=1, t=1)
dimension_sequence = PW.DimensionSequence('x', 'y', 'z', 'c', 't')
block_size = image_size
sample_size = PW.ImageSize(x=1, y=1, z=1, c=1, t=1)
options = PW.Options()
options.mNumberOfThreads = 12
options.mCompressionAlgorithmType = PW.eCompressionAlgorithmGzipLevel2
options.mEnableLogProgress = True
application_name = 'PyImarisWriter'
application_version = '1.0.0'
callback_class = MyCallbackClass()
converter = PW.ImageConverter('float32', image_size, sample_size, dimension_sequence, block_size,
                              output_filename, options, application_name, application_version, callback_class)
num_blocks = image_size / block_size
block_index = PW.ImageSize()
for c in range(num_blocks.c):
    block_index.c = c
    for t in range(num_blocks.t):
        block_index.t = t
        for z in range(num_blocks.z):
            block_index.z = z
            for y in range(num_blocks.y):
                block_index.y = y
                for x in range(num_blocks.x):
                    block_index.x = x
                    if converter.NeedCopyBlock(block_index):
                        converter.CopyBlock(np_data, block_index)

adjust_color_range = True
image_extents = PW.ImageExtents(0, 0, 0, image_size.x, image_size.y, image_size.z)
parameters = PW.Parameters()
parameters.set_value('Image', 'ImageSizeInMB', 2400)
parameters.set_value('Image', 'Info', 'float test')
parameters.set_channel_name(0, 'My Channel 1')
time_infos = [datetime.today()]
color_infos = [PW.ColorInfo() for _ in range(image_size.c)]
color_infos[0].set_color_table([PW.Color(0, 1, 1, 1)])
converter.Finish(image_extents, parameters, time_infos, color_infos, adjust_color_range)
converter.Destroy()

imaris commented 2 years ago

Hi, we are aware that the color range for float is not precise. It is derived from the intensity histogram (equal to the intensity of the furthest apart, non-empty bins), which in most cases is not accurate. Exact result could be achieved by keeping track of the data range irrespectively of the histogram. Independently, the histogram computation for float images could be improved.

If the data range is known or critical, it could be supplied to the library and adjust_color_range could be set to false.

rharkes commented 2 years ago

"If the data range is known or critical, it could be supplied to the library and adjust_color_range could be set to false." I can set the colorrange, so that is the displayrange for the LUT, but I do not know how to set the data-range. What would be the way to do that?

And I agree that the histogram computation for float images could be improved. Generally speaking, would you not set the edges of the bins according to the min/max of the dataset, and choose a number of bins based on the number of pixels, with some maximum, for example 1024?

rharkes commented 2 years ago

I read Imaris File Format Description as the filespec, but just realized that there is also the Imaris5.5 File Format Description. Which one is the most recent? The .pdf only mentions scalar datasets, no floatingpoint, so it is probably outdated?

imaris commented 2 years ago

I can set the colorrange, so that is the displayrange for the LUT, but I do not know how to set the data-range

I meant setting the display range, sorry. The data-range cannot be set because it is not stored in the file (it could help computing a better histogram, but that should be the job of the writer, really).

would you not set the edges of the bins according to the min/max of the dataset

Finding min and max would require a complete pass through the data, and then a second pass for binning the values properly (if float values, there is no problem with integer values because the number of bins is limited to the range of the datatype). Two passes are not acceptable for performance and memory requirements. Requiring the data range in advance woul solve the problem when the data range is known, but an internal solution would be strongly preferred. We have a potential solution, hopefully we'll find the resources to implement it soon enough.

I read Imaris File Format Description as the filespec [...]

Both documents are up-to-date and the pdf is more recent. Floating points are not mentioned because the document is mainly meant for writers of fluorescence images, which are typically of integer type. It would certainly be more complete with the mention of floats. The pdf is more comprehensive, the web page complements it with a more technical approach.

rharkes commented 2 years ago

Thank you, this makes it clear. The data-range is not stored in the file, I thought it was, but that is indeed not the case. It is an optional attribute to Channel X in DataSetInfo, but the histogram range is not the same as the data-range.

Deconvolved fluorescence images are often float-type btw. Both for spatial- and spectral deconvolution .

imaris / ImarisWriter

Unexpected behaviour adjust_color_range #5