tlambert03 / pycudadecon

Python wrapper for cudaDecon - GPU accelerated 3D deconvolution for microscopy
http://www.talleylambert.com/pycudadecon/
MIT License
59 stars 12 forks source link

Pycudadecon + dask #62

Open Eddymorphling opened 1 month ago

Eddymorphling commented 1 month ago

Hi Talley, I am trying to use pycudadeconon large 3D datasets generated with our gaussian ligthsheet. As you may be aware, the datasets are large and loading the entire volume during decon leads to memory issues.

I have acquired PSFs but wanted to reach out about your suggestions for doing this using dask-arrays for efficient GPU/CPU memory usage when running decon jobs using GPU on my cluster. Any notebooks/scripts that you may have that uses pycudadecon+dask to help me get started? Much appreciated. Thank you.

tlambert03 commented 1 month ago

i don't have one directly using pycudadecon, but in general, you'll want to use dask map_blocks or map_overlap. As @volkerh mentioned in https://github.com/tlambert03/pycudadecon/issues/13#issuecomment-653710403, he wrote a short tutorial on breaking up large deconvolution problems into tiles. (there he used flowdec, but the split/combine principle is the same)

https://github.com/hammerlab/flowdec/blob/master/python/examples/notebooks/Tile-by-tile%20deconvolution%20using%20dask.ipynb

Eddymorphling commented 1 month ago

Great, you are the best! Thanks.

Eddymorphling commented 1 month ago

Hi again @tlambert03 , The CLI help command does not seems to work. Is it cudaDeconv --help ?

tlambert03 commented 1 month ago

no v, just cudaDecon ... see the C++ library for the cli: https://github.com/scopetools/cudaDecon

tlambert03 commented 1 month ago

oh! my docs are wrong aren't they! Sorry about that... will update

Eddymorphling commented 1 month ago

Yeah, I was following the docs. No worries. Thank you.

Eddymorphling commented 1 month ago

Sorry to bother you again @tlambert03. Testing a set of images using the CLI command line and it seems to be stuck at the step without any progress. Output included below. Any suggestions why? Pytorch sees my GPU.


# of tiles: 3. # of overlaps: 2. Tile size: 1024 Y pix. Overlap: 213 Y pix. 

Tile: 1 out of 3.   
raw_image size             : 2478 x 1024 x 1
Eddymorphling commented 1 month ago

Here is my CLI

radialft /home/psf_4x.tif /home/psf_4x_OTF.tif --fixorigin 10 --wavelength 525 --na 0.35 --nimm 1.56 --xyres 3 --zres 3 --nocleanup 
cudaDecon /data/z-slices fused --otf-file /home/psf_4x_OTF.tif --wavelength 525 --tile 1024 
tlambert03 commented 1 month ago

not immediately sure, but let's first determine whether it has to do with your data specifically. Try the test data available here: https://github.com/tlambert03/pycudadecon/tree/main/tests/test_data

use something like

radialft.exe .\test_data\psf.tif .\test_data\otf.tif --nocleanup --fixorigin 10
cudaDecon.exe -z 0.3 -i 10 -D 31.5 .\test_data\im_raw .\test_data\otf.tif
Eddymorphling commented 1 month ago

Yes that worked with your test dataset. Not sure what's not compatible with my input datasets. Here is some background.

  1. My PSF is not experimental, it is computationally generated (xyz = 256x256x256).
  2. The OTF file seems good.
  3. My input files are single tiff slices in a folder. These are gaussian LS tiff slices. They are 4339x5942 (xy). uneven pixel count as they are stitched and fused.

I kept it simple this time and here is what I ran. But gets stuck in the step I mentioned above.


radialft /home/scratch/pycudadecon/psf_4x.tif /home/scratch/pycudadecon/psf_4x_OTF.tif --fixorigin 10 --wavelength 525 --na 0.35 --nimm 1.56 --xyres 3 --zres 3 --nocleanup 
cudaDecon -i 10 /home/scratch/pycudadecon/z-slices fused --otf-file /home/scratch/pycudadecon/psf_4x_OTF.tif
Eddymorphling commented 1 month ago

Ah I figured it out. When the documentation said the input folder can be tiff files, what I assumed is they can be 2D tiff files generated from a 3D stack, not multiple 3D stacks. My bad. Running it on a 3D stack works now. Thanks Talley.

tlambert03 commented 1 month ago

great!

Eddymorphling commented 3 weeks ago

Hi again @tlambert03 I have been trying to setup a workflow that uses daskand pycudadecon for just 3D gaussian light sheet datasets but have been facing trouble. I adapted this example which I believe was written by you? Not sure.. I removed the deskew function and just told it to crop and deconvolve as they are 3D (xyz) gaussian LS datasets.

I adapted what I found and tried to run dask+pycudadecon+GPU on our cluster. Here is what I came up with


import napari
import pycudadecon
from functools import partial
from skimage import io
from dask_image.imread import imread
import dask.array as da
import tifffile
from dask.distributed import Client

def main():

    client = Client()  # This will use all available cores by default
    print(client)

    # Load stacks with dask_image, and PSF with skimage
    # Each file in the folder is a single Z slice from the 3D volume
    stack = imread("/data/z-slices/*.tif")
    psf = io.imread("/home/psf_4x.tif")

    # Print the chunk sizes of the stack
    print("Chunk sizes:", stack.chunks)

    # Prepare some functions that accept a numpy array and return a processed array
    def crop(array):
        # Simple cropping function
        return array[:, 2:, 10:-20, :500]

    # https://docs.python.org/3.8/library/functools.html#functools.partial
    deconv = partial(pycudadecon.decon, psf=psf, background=10)

    # Map and chain those functions across all dask blocks
    deconvolved = stack.map_blocks(deconv, dtype="float16")
    cropped = deconvolved.map_blocks(crop, dtype="float16")

    output_path = "/home/scratch/decon.tif"
    tifffile.imwrite(output_path, deconvolved.compute())

    # Put the resulting dask array into napari.
    # (Don't forget the contrast limits and multiscale==False !)
    v = napari.view_image(
        cropped,
        contrast_limits=[90, 1500],
        multiscale=False,
        ndisplay=3,
        scale=(3, 1, 1),
    )

    napari.run()

if __name__ == '__main__':
    main()

My input folder is tiff files series where each file is a single slice, while my PSF is a 3D volume. When I run this, it does not save any deconvolved image nor open the image in napari. Below is what I see in the terminal, Would you happen to know what's going on? Thank you.


<Client: 'tcp://127.0.0.1:45707' processes=8 threads=64, memory=503.62 GiB>
Chunk sizes: ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5942,), (4339,))
New chunk sizes: ((5, 5, 3), (1000, 1000, 1000, 1000, 1000, 942), (1000, 1000, 1000, 1000, 339))
callednx=74, ny=74, nz=74
Center of mass is (37.500, 37.500, 38.100)
Background is 0.000
In radialft()
tlambert03 commented 3 weeks ago

looking, quick question: where is this getting printed?

New chunk sizes: ((5, 5, 3), (1000, 1000, 1000, 1000, 1000, 942), (1000, 1000, 1000, 1000, 339))

tlambert03 commented 3 weeks ago

as for the failure, it looks like something is going wrong at the moment of OTF generation (in radialft)... I would simplify this for now, leaving out napari, and simply calling cropped.compute(), also, try putting prints and/or breakpoints in the various functions to see exactly what dimensions/data are getting passed.

Try also leaving out dask and all that complexity too. If you're trying to build up to something that does block-wise deconvolution and stitches them all back together, then first just try to manually deconvolve one of those blocks (i.e. just literally crop your data down to a smaller subvolume that would match the chunk size you're trying to create). Then just deconvolve that single chunk and see how it goes.

If you're failing on the radialft part, then it suggests there might be a problem with your psf file... so I would suggest manually going through what the decon function does, and see where it fails. for example maybe first try to make a TemporaryOTF from your psf like this line in the decon function does. In general, break it down to much smaller steps, before building back up to the full map-blocks-chunked decon in napari

Eddymorphling commented 3 weeks ago

Thanks! This was an option that I included to rechunk the dataset. It get printed before the decon functions starts. Tried it without it with this fails.

tlambert03 commented 3 weeks ago

have you confirmed that make_otf works on your psf_4x.tif file? https://www.talleylambert.com/pycudadecon/otf.html#pycudadecon.make_otf

tlambert03 commented 3 weeks ago

Thanks! This was an option that I included to rechunk the dataset. It get printed before the decon functions starts. Tried it without it with this fails.

yeah, just didn't see the actual rechunking... which I do expect to be an important step. I would probably try to leave the full z in there (i.e. don't chunk down to (5, 5, 3) on the z axis, but rather (13,)... that's a small z as it is, and the deconvolution probably won't work well with something like only 3-5 planes. try (13, 512, 512) ... and then start increasing xy later.

but first do it manually, and also, i wouldn't expect that bit to have anything to do with the failed otf generation, so test the the make_otf step independently first

Eddymorphling commented 3 weeks ago

have you confirmed that make_otf works on your psf_4x.tif file?

Yes, this works when running independently. But not when I use my code.

Eddymorphling commented 3 weeks ago

yeah, just didn't see the actual rechunking... which I do expect to be an important step. I would probably try to leave the full z in there (i.e. don't chunk down to (5, 5, 3) on the z axis, but rather (13,)... that's a small z as it is, and the deconvolution probably won't work well with something like only 3-5 planes. try (13, 512, 512) ... and then start increasing xy later.

but first do it manually, and also, i wouldn't expect that bit to have anything to do with the failed otf generation, so test the the make_otf step independently first

I initially used 13 slices just to test. But now I also tried around 150 z-slices, still fails. My PSF is theoretical for testing purposes, its 64x64x64 that I generated with PSF generator.

tlambert03 commented 3 weeks ago

apologies that this isn't easier to debug. I definitely do see this is something that pycudadecon could make easier for you. There are indeed some sneaky "gotchas" in here, with failures caused by subtle things that I don't always entirely understand. It can unfortunately sometimes take a little trial and error to get it all working on a new setup, and it's annoying when it fails without any useful info.

I know it's a very large file, but would you be willing to send me a dropbox link to the data and your generated PSF?

Eddymorphling commented 3 weeks ago

Sure, thing. Thanks for helping me out. The 3D data can be big depending on the number of z-slices. How many would you like?

tlambert03 commented 3 weeks ago

hmm, how many are there total? what is the full size of the data?

Eddymorphling commented 3 weeks ago

Its around 1500 slices, which comes to around 200gb. Shall I send you 50 slices?

tlambert03 commented 3 weeks ago

maybe 100? if you'd prefer not to link here try talley.lambert@gmail.com

Eddymorphling commented 3 weeks ago

Got it. Its on the way, thanks again!

tlambert03 commented 3 weeks ago

thank you! may take me a few days, but curious to have a look

Eddymorphling commented 2 weeks ago

Hi Talley, Wondering if you got the chance to troubleshoot?

tlambert03 commented 2 weeks ago

i think your issues with hanging at In radialft() are related to an idiosyncracy of dask map blocks. Your function needs to be able to handle 0-dimensional arrays (see https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html for details).

You can solve this either by using the meta parameter that they describe there, or by making a function that can properly handle zero dimensional arrays like this:

from pycudadecon import decon
import tifffile as tf
from dask_image.imread import imread
from dask.distributed import Client
import numpy as np

def main():
    # important to limit to a single thread & worker, otherwise the GPU will be shared
    _ = Client(threads_per_worker=1, n_workers=1)

    root = "C:/Users/talley/Downloads/Tiffs-20240823T145014Z-001/Tiffs/"
    psf = root + "psf_4x.tif"

    stack = imread(root + "tiffs/*.tif")
    stack = stack.rechunk((101, 512, 512))

    def decon_chunk(chunk, block_id):
        # handle empty array
        if chunk.size == 0:
            return np.array([]).astype("float32")
        print("decon_chunk", chunk.shape, "block_id", block_id)
        return decon(chunk, psf=psf, background=100)

    deconvolved = stack.map_blocks(decon_chunk, dtype="float32")

    # small portion of the image to demonstrate
    result = deconvolved[:, 2048:3072, 2048:3072].compute()
    tf.imwrite(root + "decon.tif", result)

if __name__ == "__main__":
    main()

the other issue you will run into is that you'll need to restrict dask to a single thread & worker, otherwise your GPU will be overwhelmed.

Eddymorphling commented 2 weeks ago

Hi @tlambert03 , this works! Thank you. What would I change here if I want to perform dask decon with chunks and overlap instead of running it on a cropped region? The goal is to do this for the large 3D dataset. This is what I have so far. Most of it is adapted from Robert Haase's Clij2 decon pipeline using dask arrays.


from pycudadecon import decon
import tifffile as tf
from dask_image.imread import imread
from dask.distributed import Client
import dask.array as da
import numpy as np
from skimage import img_as_uint
import os

def main():
    # Important to limit to a single thread & worker, otherwise the GPU will be shared
    _ = Client(threads_per_worker=1, n_workers=1)

    root = "/data/flowdec/z-slices/"
    psf_path = "/data/flowdec/psf_4x.tif"

    stack = imread(root + "*.tif")

    num_x_chunks = 5
    num_y_chunks = 5
    num_z_chunks = 20

    z_chunk_size = int(stack.shape[0] / num_z_chunks)
    y_chunk_size = int(stack.shape[1] / num_y_chunks)
    x_chunk_size = int(stack.shape[2] / num_x_chunks)
    print('chunks', z_chunk_size, y_chunk_size, x_chunk_size)

    # Create Dask image with specified chunk sizes
    stack = stack.rechunk((z_chunk_size, y_chunk_size, x_chunk_size))

    # Define the overlap size
    overlap = 50

    def decon_chunk(chunk, block_info=None):
        # Handle empty array
        if chunk.size == 0:
            return np.array([]).astype("float32")
        print("decon_chunk", chunk.shape, "block_info", block_info)
        return decon(chunk, psf=psf_path, background=100, n_iters=50)

    # Apply deconvolution with overlap
    deconvolved = stack.map_overlap(decon_chunk, depth={0: 0, 1: overlap, 2: overlap}, boundary='reflect', dtype="float32")

    # Compute the entire deconvolved stack (this is when data is read into memory)
    result = deconvolved.compute()

    # Normalize the result to the range [0, 1]
    result_normalized = (result - np.min(result)) / (np.max(result) - np.min(result))

    result_16bit = img_as_uint(result_normalized)

    tf.imwrite(root + "decon.tif", result_16bit, compression='zlib')

    # Convert the deconvolved stack to individual TIFF files
    # Extract the basename (file name) and first 10 characters of it
    basename = os.path.basename(root + "decon.tif")
    basename_first_11_chars = basename[:11]

    tiffs_folder = os.path.join(root, 'tiffs_decon')
    if not os.path.exists(tiffs_folder):
        os.makedirs(tiffs_folder)

      img_stack = tf.imread(root + "decon.tif")

    # Loop through each slice, create and save new tiffs
    for i in range(img_stack.shape[0]):
         new_image_path = os.path.join(tiffs_folder, f"{basename_first_11_chars}_{str(i).zfill(4)}.tif")
         tf.imwrite(new_image_path, img_stack[i], compression='LZW')

    print(f"Deconvolution and TIFF conversion completed. Individual TIFF files saved in {tiffs_folder}")

if __name__ == "__main__":
    main()
tlambert03 commented 2 weeks ago

Yeah, using map_overlap is the key. That's what you'll need to do this properly without having bad edge artifacts everywhere. Do you have a specific question about it?

Eddymorphling commented 2 weeks ago

I do see a few artefacts in the final dataset. What would you suggest to reduce such artefacts?

tlambert03 commented 2 weeks ago

The main parameters you will want to play with will be the chunk size and the overlap size (the "depth" parameter to map_overlap). The larger the depth, the less edge artifact you should see, but the more "wasted" work you're doing. Also read up on the other parameters and just try things, it will take some experimenting.

https://docs.dask.org/en/stable/generated/dask.array.overlap.map_overlap.html