Open MikeSullivan7 opened 1 month ago
Some Benchmarks:
Running python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\mantidimaging-data-main\mantidimaging-data-main\ISIS\IMAT\IMAT00010675\Tomo"
With the Delayed Stack not being created with create_delayed_array=False
, it takes 0.178 seconds
to run All _handle_directory_change
in the Live Viewer Model.
create_delayed_array=True
takes 9.863 seconds
I will also check the timings when simulating live data but it would be useful to append to the existing Delayed Stack rather than creating and replacing already created Image_Data
objects with their associated delayed arrays.
Using the code
if len(images) % 50 == 0:
with ExecutionProfiler(msg=f"create delayed array and compute mean for {len(images)} images"):
dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)
if dask_image_stack.delayed_stack is not None:
arrmean = dask.array.mean(dask_image_stack.delayed_stack, axis=(1, 2))
print(arrmean.compute())
else:
dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)
We get:
Ive benchmarked with the smaller and larger datasets and attempted to rechunk the Dask Array away from its default, e.g. chunksize = (1, 512, 512) for a (512, 512) dataset. Setting dask.array.rechunk('auto') makes things slower due to the way it accesses the chunks when we access each data slice to compute the mean.
Update to Benchmarking:
Ive found that we can speed up the calculation of the mean while the LV is running by taking the mean of each image coming in with the stored running mean.
Issue
Closes #2311
Description
Dask is now used to load in the files in the Live Viewer path and display them as normal. Dask allows us to have a delayed array of all image data in the directory but without loading all of the data into memory. In order to display the images in the Live Viewer, the delayed array pointing to the image data is "computed" as needed but not stored permanently into memory.
This allows us to perform operations on the live data which would require the whole imagestack (mean, spectrum, etc), but without loading and storing the whole stack into memory at once. This PR acts as a proof of principle of the usefulness of Dask in Mantid Imaging, and gives a foundation of the structures needed to make Dask work.
Compatibility has been added for both
.tif
and.fits
files but they are dealt with separately as.fits
files are not natively supported by Dask and therefore the delayed arrays and computations have been done manually.Testing
make check
Acceptance Criteria
1) Open MI, open the Live Viewer and point to a folder with data, e.g.
python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\Large Dataset\Flower_WhiteBeam\Tomo"
It would be preferable to do this with a larger dataset to easily see the benefit of using Dask.2) Check that the images load as normal and you can move between frames with no errors or appreciable slowdown. 3) Perform an "Operation" on the whole imagestack. While we do not currently implement these kinds of operations in the Live Viewer yet, you can paste the following code into line 346 of
mantidimaging/gui/windows/live_viewer/model.py
:This will take the delayed imagestack and calculate a form of spectrum of all images in the Live Viewer folder. As you open and initialise the Live Viewer, keep an eye on your RAM usage and check that the RAM usage does not increase by the size of the imagestack (this is easier to see with the Flower_WhiteBeam dataset as it is around 9GB).
Check that this calculated spectrum is what you would expect for the dataset, for example, for the Flower_Whitebeam data, you should get this:
For the
MantidImaging Data\Brass\Corrected_Sample_PH20
data, you should get:Repeat this process with both .tif and .fits datasets to make sure both are functional.
As the nature of how some of the Live Viewer data structures and flows work has been changed, the Live Viewer tests have been altered to reflect this.
Documentation
Will add release note