jlevy44 / PathFlowAI

A High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology
https://jlevy44.github.io/PathFlowAI/
MIT License
38 stars 8 forks source link

Issue with Transposing Dask Array and Application of Annotations #26

Closed jlevy44 closed 4 years ago

jlevy44 commented 4 years ago

There may be an issue associated with https://github.com/jlevy44/PathFlowAI/pull/6 that transposes the dask array from its original orientation.

Debug and PR is needed ASAP.

sumanthratna commented 4 years ago

Yikes—I intentionally added a final transpose: https://github.com/jlevy44/PathFlowAI/pull/6/commits/c2b0e54e052261eaeafae842cf365d98003d54ec#diff-bf069ac5bda9fb89424f63378deef48bR127, which was originally commented out. If I remember correctly, this is because the SVS I was testing on got transposed. I'll look into this.

sumanthratna commented 4 years ago

Hmmm, I think the final transpose is correct. The napari screenshot is what's loaded via svs2_daskarray (spliced with 10000:14000,10000:14000), and the other screenshot is the ground truth of the approximate region: Screen Shot 2020-05-26 at 21 49 03 Screen Shot 2020-05-26 at 21 49 31

(TCGA-DD-AADV-01Z-00-DX1)

This is the script I used with napari:

from openslide import deepzoom
import openslide
import dask
import numpy as np
import dask.array as da
import dask_image.imread

def svs2dask_array(
    svs_file,
    tile_size=1000,
    overlap=0,
    remove_last=True,
    allow_unknown_chunksizes=False,
):
    """Convert SVS, TIF or TIFF to dask array.
    Parameters
    ----------
    svs_file : str
            The path to the image file.
    tile_size : int, optinal
            The size of the chunk to be read in.
    overlap : int, optional
            Do not modify, overlap between neighboring tiles.
    remove_last : bool, optional
            Whether to remove the last tile because it has a custom size.
    allow_unknown_chunksizes : bool, optional
            Whether to allow different chunk sizes. If True, flexibility
            increases, but this method becomes slower. The default is False.
    Returns
    -------
    arr: dask.array.Array
            A Dask Array representing the contents of the image file.
    Examples
    --------
    >>> arr = svs2dask_array(svs_file, tile_size=1000, overlap=0, remove_last=True, allow_unknown_chunksizes=False)
    >>> arr2 = arr.compute()
    >>> arr3 = Image.fromarray(cv2.resize(arr2, dsize=(1440, 700), interpolation=cv2.INTER_CUBIC))
    >>> arr3.save(test_image_name)
    """
    img = openslide.open_slide(svs_file)
    if type(img) is openslide.OpenSlide:
        gen = deepzoom.DeepZoomGenerator(
            img, tile_size=tile_size, overlap=overlap, limit_bounds=True
        )
        max_level = len(gen.level_dimensions) - 1
        n_tiles_x, n_tiles_y = gen.level_tiles[max_level]

        @dask.delayed(pure=True)
        def get_tile(level, column, row):
            tile = gen.get_tile(level, (column, row))
            return np.array(tile).transpose((1, 0, 2))

        sample_tile_shape = get_tile(max_level, 0, 0).shape.compute()
        rows = range(n_tiles_y - (0 if not remove_last else 1))
        cols = range(n_tiles_x - (0 if not remove_last else 1))

        arr = da.concatenate(
            [
                da.concatenate(
                    [
                        da.from_delayed(
                            get_tile(max_level, col, row), sample_tile_shape, np.uint8
                        )
                        for row in rows
                    ],
                    allow_unknown_chunksizes=allow_unknown_chunksizes,
                    axis=1,
                )
                for col in cols
            ],
            allow_unknown_chunksizes=allow_unknown_chunksizes,
        ).transpose([1, 0, 2])
        return arr
    else:  # img is instance of openslide.ImageSlide
        return dask_image.imread.imread(svs_file)

import napari

with napari.gui_qt():
    viewer = napari.view_image(svs2dask_array('/path/to/svs')[10000:14000, 10000:14000])
sumanthratna commented 4 years ago

It's strange how I didn't notice that this would be a breaking change. We can:

jlevy44 commented 4 years ago

@lvaickus and I will discuss and see what option we like the best. I’m leaning towards including a transpose with the option to opt out

jlevy44 commented 4 years ago

I added a transpose option that should be compatible with your software.

Also, there could be issues with utilizing: https://github.com/jlevy44/PathFlowAI/blob/1b18187ff8fcdac97906902b73c52601e7f7cf75/pathflowai/utils.py#L296 When using the no_zarr option.

I will have to think more about enforcing compatibility with annotations.

jlevy44 commented 4 years ago

https://github.com/jlevy44/PathFlowAI/commit/74b25ed537b553b67b88ff36ad338da104b9813d

jlevy44 commented 4 years ago

https://github.com/jlevy44/PathFlowAI/blob/master/pathflowai/utils.py#L127

jlevy44 commented 4 years ago

I think under the no_zarr specification, running transpose_annotations may work remedy possible issues.

jlevy44 commented 4 years ago

Okay, the changes fixed things on my end.. Closing this issue.