Added Dask loading of files to Live Viewer backend

MikeSullivan7 commented 1 month ago

Issue

Closes #2311

Description

Dask is now used to load in the files in the Live Viewer path and display them as normal. Dask allows us to have a delayed array of all image data in the directory but without loading all of the data into memory. In order to display the images in the Live Viewer, the delayed array pointing to the image data is "computed" as needed but not stored permanently into memory.

This allows us to perform operations on the live data which would require the whole imagestack (mean, spectrum, etc), but without loading and storing the whole stack into memory at once. This PR acts as a proof of principle of the usefulness of Dask in Mantid Imaging, and gives a foundation of the structures needed to make Dask work.

Compatibility has been added for both .tif and .fits files but they are dealt with separately as .fits files are not natively supported by Dask and therefore the delayed arrays and computations have been done manually.

Testing

make check

Acceptance Criteria

1) Open MI, open the Live Viewer and point to a folder with data, e.g. python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\Large Dataset\Flower_WhiteBeam\Tomo" It would be preferable to do this with a larger dataset to easily see the benefit of using Dask.

2) Check that the images load as normal and you can move between frames with no errors or appreciable slowdown. 3) Perform an "Operation" on the whole imagestack. While we do not currently implement these kinds of operations in the Live Viewer yet, you can paste the following code into line 346 of mantidimaging/gui/windows/live_viewer/model.py:

arrmean= dask.array.mean(dask_image_stack.delayed_stack, axis=(1, 2))
import matplotlib.pyplot as plt
plt.plot(arrmean.compute())
plt.show()

This will take the delayed imagestack and calculate a form of spectrum of all images in the Live Viewer folder. As you open and initialise the Live Viewer, keep an eye on your RAM usage and check that the RAM usage does not increase by the size of the imagestack (this is easier to see with the Flower_WhiteBeam dataset as it is around 9GB).

Check that this calculated spectrum is what you would expect for the dataset, for example, for the Flower_Whitebeam data, you should get this:

For the MantidImaging Data\Brass\Corrected_Sample_PH20 data, you should get:

Repeat this process with both .tif and .fits datasets to make sure both are functional.

As the nature of how some of the Live Viewer data structures and flows work has been changed, the Live Viewer tests have been altered to reflect this.

Documentation

Will add release note

coveralls commented 1 month ago

coverage: 74.021% (-0.3%) from 74.322% when pulling 486da426f79cc270eab224e4251b4a008ab75d6d on dask_live_viewer into f3c9e9ddb7883016ce91bbfa03ccd99321270713 on main.

MikeSullivan7 commented 1 month ago

Some Benchmarks:

Running python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\mantidimaging-data-main\mantidimaging-data-main\ISIS\IMAT\IMAT00010675\Tomo"

With the Delayed Stack not being created with create_delayed_array=False, it takes 0.178 seconds to run All _handle_directory_change in the Live Viewer Model. create_delayed_array=True takes 9.863 seconds

I will also check the timings when simulating live data but it would be useful to append to the existing Delayed Stack rather than creating and replacing already created Image_Data objects with their associated delayed arrays.

MikeSullivan7 commented 1 month ago

Using the code

        if len(images) % 50 == 0:
            with ExecutionProfiler(msg=f"create delayed array and compute mean for {len(images)} images"):
                dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)
                if dask_image_stack.delayed_stack is not None:
                    arrmean = dask.array.mean(dask_image_stack.delayed_stack, axis=(1, 2))
                    print(arrmean.compute())
        else:
            dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)

We get:

MikeSullivan7 commented 1 month ago

Ive benchmarked with the smaller and larger datasets and attempted to rechunk the Dask Array away from its default, e.g. chunksize = (1, 512, 512) for a (512, 512) dataset. Setting dask.array.rechunk('auto') makes things slower due to the way it accesses the chunks when we access each data slice to compute the mean.

MikeSullivan7 commented 3 weeks ago

Update to Benchmarking:

Ive found that we can speed up the calculation of the mean while the LV is running by taking the mean of each image coming in with the stored running mean.

mantidproject / mantidimaging