spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
223 stars 67 forks source link

Is there a function that returns the outline of a segmented cell? #998

Closed sethb3744 closed 5 years ago

sethb3744 commented 5 years ago

For example, after using Segmentation.Watershed to segment a cell, is there some function that takes the resulting np.ndarray as an input and returns another np.ndarray with the outlines of all segmented cells. This doesn't have to be the exact method; anything similar will do.

I have tried looking around already for something similar, but I haven't been able to find it. If that function doesn't exist yet, I would like to help to write one since I need that function anyway.

Thank you!

Edit: To give some background to this, the reason I want to do this is because I want to detect and count dots that are only contained within cells, so I would use the outlines of cells from Segmentation.Watershed to crop the dots image to only include those outlined areas, and then count dots in those parts of the image. It is entirely possible that there is a function that does that already that I have just not been able to find yet. If that exists yet, please let me know.

ambrosejcarr commented 5 years ago

Yes! That function does exist in starfish. Given an IntensityTable containing called spots and the output of Segmentation.watershed it will assign each spot in the IntensityTable to a cell:

https://github.com/spacetx/starfish/blob/master/starfish/spots/_target_assignment/label.py

It's accessed via starfish.spots.TargetAssignment.Label

sethb3744 commented 5 years ago

Thank you so much! I used Label successfully, and I am currently trying to visualize the resulting IntensityTable using MERmaid, but I ran into an issue where the object in the picture below entitled "this.props.color" seems to be Null. I am unsure if this is an issue with the IntensityTable or MERmaid at the moment. I will start an issue on MERmaid to address this, but in case you have seen a similar error in the past, I wanted to let you know about this. In case you want to know what I did in my code, here is a link to my IPython script: https://drive.google.com/file/d/1JW8npPJQ4DT3-kOQK1GJ5JIj9kZbZewX/view?usp=sharing

screen shot 2019-02-13 at 4 08 21 pm

ambrosejcarr commented 5 years ago

We don't have any javascript in our project, so it is either an issue with MERmaid or the interaction between our output format and their input expectations. We haven't updated how we write the data in a while -- it's possible the expectations have changed. If you're able to solve this problem we'd love to know the solution so that we can fix it on our side.

sethb3744 commented 5 years ago

I'll update you when I figure it out. Thanks for your help.

sethb3744 commented 5 years ago

I have submitted an issue to their Github. In the meantime, are there any other ways to visualize data from an IntensityTable or ExpressionMatrix object?

sethb3744 commented 5 years ago

The issue isn't fixed yet, but I learned from the MERmaid developer that when cloning MERmaid, you have to clone a particular branch made specifically for starfish. You can add the following line to your documentation to tell people to clone the corrrect branch: git clone -b starfish --single-branch https://github.com/JEFworks/MERmaid.git

ambrosejcarr commented 5 years ago

Ok, thanks! I'm linking to that issue. It sounds like we need to update some things to meet their input expectations.

https://github.com/JEFworks/MERmaid/issues/8

sethb3744 commented 5 years ago

Ambrose, you might have already seen the answer that JEFworks gave on the other thread. The only thing that needs to be fixed is the header. The header column used to be x,y,target,target, but it needs to be changed to x,y,gene1,gene2. Thankfully it's a simple fix.

ambrosejcarr commented 5 years ago

Would you be interested in contributing the fix to starfish, or would you prefer that I make it?

sethb3744 commented 5 years ago

Yeah, I added the line below to line 223 of intensity_table.py: mermaid_data.columns = ['x', 'y', 'gene1', 'gene2']

It's been a long time since I've done anything with git other than just cloning. I could look into how to commit and push later today to update the current code.

sethb3744 commented 5 years ago

I have added some extra code, but I'm not sure you would want me to implement it. Different people may want to use MERmaid differently.

  1. Visualize all dots, without any regard for whether they are in cells or not. Only requires any detection function.
  2. Visualize only dots in cells. Requires detection, segmentation, and target assignment.

For the first way, the code would essentially just look how it is now, which I've pasted below:

df = self.to_features_dataframe() # self is an IntensityTable object
column_order = [
    Axes.X,
    Axes.Y,
    Features.TARGET,
    Features.TARGET,  # added twice to support simultaneous coding
]
mermaid_data = df[column_order]
mermaid_data.columns=['x', 'y', 'gene1', 'gene2']
mermaid_data.to_csv(filename, compression='gzip', index=False)

For the second way, the code would also include cell IDs in mermaid_data and then hide the dots outside of cells, as such:

df = self.to_features_dataframe()
column_order = [
    Axes.X,
    Axes.Y,
    Features.TARGET,
    Features.TARGET,  # added twice to support simultaneous coding
    'cell_id'
]
mermaid_data = df[column_order]
mermaid_data.columns=['x', 'y', 'gene1', 'gene2', 'cell']

# hide dots not in cells, which have cell IDs of 0
are_cells = mermaid_data['cell'] != 0
mermaid_cells = mermaid_data[are_cells]
mermaid_cells.to_csv(filename, compression='gzip', index=False)

So, should we create an extra boolean variable in the save_mermaid function that allows people to choose which way to do it?

sethb3744 commented 5 years ago

I've tried out the above code and it works, but unfortunately it doesn't work very well for my project. Our segmented nuclei don't have that many dots, and because MERmaid visualizes cells by connecting the dots within cells, the visualization isn't a good representation of what the nuclei actually look like (screenshot below). It would work better for projects with much more spots than mine, like the example on the MERmaid Github page. mermaid cells

I think a better option would be to mimic the way that you visualized data in notebooks/assay_comparison.ipynb (screenshot below). screen shot 2019-02-14 at 5 20 00 pm

However, when I tried using the method you used in assay_comparison.ipynb, I only get a completely black square instead of the dots or segmentation images. The code I used is below, where decoded_intensities is my IntensityTable made through SpotDetection.BlobDetector, TargetAssignment.Label, and Decoder.PerRoundMaxChannelDecoder. I also tried making decoded_intensities without target assignment to see if that was the issue.

f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(ncols=2, nrows=2, figsize=(30, 45))
decoded_spots(
    decoded_intensities,
    background_image=np.zeros_like(nuclei_mp_numpy),
    spots_kwargs=dict(alpha=1.),
    ax=ax1
)
for ax in (ax2, ax3, ax4):
    ax.set_axis_off()

Extra info:

screen shot 2019-02-14 at 5 41 42 pm screen shot 2019-02-14 at 5 42 16 pm

sethb3744 commented 5 years ago

Update: The IntensityTable still doesn't display using this method, but I at least got the background_image to show up. In the example, background_image was defined as background_image=np.zeros_like(nuclei_mp_numpy) in one example and background_image=nuclei_mp_numpy in a different example. The first one only displays a black square, while the second one successfully displays the nuclei image. It may be that in the example, they did that on purpose to see the dots by themselves in the first image first, but I didn't know until I looked into it further.

Since the spots from intensities still don't show up, I tried using a different method that is used in starfish/notebooks/ISS_Pipeline, which did work for me. I'll copy the code below. I added one step where the IntensityTable is filtered to only include spots in segmented cells (if interested, see line below that starts with decoded_intensities_cell). It works well, although it shows that the software is not completely correct, but very close at least (see screenshot). Edit: It definitely is correct; I just had to change the way that segmentation was run to exclude dots in the stain but not in the nuclei.

from skimage.color import rgb2gray

GENE1 = 'GFP'

rgb = np.zeros(registered_image.tile_shape + (3,))
nuclei_mp = nuclei.max_proj(Axes.ROUND, Axes.CH, Axes.ZPLANE)
nuclei_numpy = nuclei_mp._squeezed_numpy(Axes.ROUND, Axes.CH, Axes.ZPLANE)
rgb[:,:,0] = nuclei_numpy
# dots_mp = dots.max_proj(Axes.ROUND, Axes.CH, Axes.ZPLANE)
# dots_mp_numpy = dots_mp._squeezed_numpy(Axes.ROUND, Axes.CH, Axes.ZPLANE)
# rgb[:,:,1] = dots_mp_numpy
do = rgb2gray(rgb)
do = do/(do.max())

image(do,size=10)
with warnings.catch_warnings():
    warnings.simplefilter('ignore', FutureWarning)
    decoded_intensities_cell = decoded_intensities.where(decoded_intensities[Features.AXIS]['cell_id'] != 0, drop=True)
    is_gene1 = decoded_intensities_cell.where(decoded_intensities_cell[Features.AXIS][Features.TARGET] == GENE1, drop=True)

plt.plot(is_gene1.x, is_gene1.y, 'og', markersize=1)
plt.title(f'Green: {GENE1}');

screen shot 2019-02-15 at 11 16 25 am Thanks again for all your help! Please still do look into my question from before about whether we should create an extra boolean variable in the save_mermaid function when you get the time.

ambrosejcarr commented 5 years ago

Thank you for the detailed reports. I can look into the issues you highlighted here early next week. however, it sounds like your use cases might benefit from using napari instead of Mermaid or our own internal tools.

if you install using the optional dependency:

pip3 install starfish[napari]

You will be able to visualize the image and the spots using the napari gui. Of the visualization options, this is currently my favorite. In the ipython environment:

%gui qt
import starfish.display
starfish.display.stack(image_stack, intensity_table)
sethb3744 commented 5 years ago

Great, then I will hold off on save_mermaid until you get the chance to take a look.

That visualization worked for me, thanks! The other one also works, which is good, because both options have pros to them, so I may end up using both. Thanks again.

One more question has come up on my side: Each cell has a cell ID, and I know the number of spots in each cell, but I can't tell which cell in the final image is associated with which cell ID. Is it possible to visualize the segmented cells with their cell_id number right next to each cell? Or some other way to determine what the cell ID is for each cell in the segmented image? Thanks again for all your help!

ambrosejcarr commented 5 years ago

Each cell has a cell ID, and I know the number of spots in each cell, but I can't tell which cell in the final image is associated with which cell ID. Is it possible to visualize the segmented cells with their cell_id number right next to each cell? Or some other way to determine what the cell ID is for each cell in the segmented image? Thanks again for all your help!

This is a great point. I don't think we currently have this capability, but we are going to revisit segmentation and regions of interest in the next month and will be sure to consider this use case.

sethb3744 commented 5 years ago

Great, I'm glad I could find you guys another use case. In the meantime, I wrote this code for the visualization method that uses matplotlib.plot (this code was adapted from notebooks/ISSPipeline-Breast-_1_FOV.ipynb). It's not perfect but it works for what I want to do:

import matplotlib.pyplot as plt
from skimage.color import rgb2gray

GENE1 = 'GFP'

rgb = np.zeros(registered_image.tile_shape + (3,))
rgb[:,:,0] = filtered_nuclei_mp_numpy
do = rgb2gray(rgb)
do = do/(do.max())

# filter decoded_intensities to only include dots in a cell and that belong to gene1
with warnings.catch_warnings():
    warnings.simplefilter('ignore', FutureWarning)
    is_in_cell = decoded_intensities.where(decoded_intensities[Features.AXIS]['cell_id'] != 0, drop=True)
    is_gene1 = decoded_intensities_cell.where(is_in_cell[Features.AXIS][Features.TARGET] == GENE1, drop=True)

# get a coordinate for each cell ID number
cache = []
xy_coords = {}
index = 0
for i in is_gene1['cell_id'].data:
    if i not in cache:
        cache.append(i)
        xy_coords[i] = (is_gene1['x_max'].data[index], is_gene1['y_max'].data[index])
    index = index+1

# plot dots with nuclei in background and add cell ID numbers
image(do,size=10)
plt.plot(is_gene1.x, is_gene1.y, 'og', markersize=1.5) # change 'og' for color
plt.title(f'Green: {GENE1}');
for i,(x,y) in xy_coords.items():
    if x > 997:
        x = 997 # if the x-coordinate is too large, the number will fall off the image, so correct it here
    plt.text(x, y, str(i), fontsize=12, color='yellow')

The resulting image is: screen shot 2019-02-20 at 11 35 55 am The image helps immensely with knowing which cell IDs to correct. Segmentation will often segment one cell as multiple cells for some reason (see screenshot below). screen shot 2019-02-20 at 11 37 19 am Is there any way to use the parameters of Segmentation.Watershed to make sure this doesn't happen? I have tried countless variations of the parameters but the function will always either do this or not detect enough cells.

In order to manually fix this, I wrote a function that will "merge" the over-segmented cells by changing cell IDs for spots in the multiple segments to just one segment's cell ID within is_gene1 (an xarray.DataArray object). Then it corrects all cell IDs above this so there aren't any gaps in cell IDs in the final image. This way, the spot counts are correct for each cell and the output image shows only one cell ID per cell. The code is below if you are interested in seeing it:

def mergeNuclei(nucleus_to_remove_ID: int, nucleus_to_merge_with_ID: int, cellsDataArray, runningCorrectionsList: dict):
    cellList = cellsDataArray['cell_id'].data
    indicesCache = []
    indexAppears = False
    for i in range(0,len(cellList)):
        if cellList[i] == nucleus_to_remove_ID:
            indexAppears = True
            cellsDataArray['cell_id'].data[i] = nucleus_to_merge_with_ID
        if cellList[i] > nucleus_to_remove_ID:
            indicesCache.append(i)
    if indexAppears:
        for i in indicesCache:
            cellsDataArray['cell_id'].data[i] = cellsDataArray['cell_id'].data[i] - 1
    return cellsDataArray

I am still working to optimize the function above. If you are interested, I can update you later on about how I do that.

sethb3744 commented 5 years ago

In case you would like to see how I optimized the function I wrote in the last comment, it's pasted below. The assertions I added prevent the function from "losing" cell IDs, which would result in spot numbers not merging.

# mergeNuclei corrects over-segmentation
mergeList = [(5,8),(6,8),(7,9),(14,15),(20,21),(21,22)]
decoded_intensities_cell = mergeNuclei(mergeList, decoded_intensities_cell,\
        numCells=max(xy_coords.keys())) # xy_coords is a dict of {cell_ID: (x_max, y_max)} 
# decoded_intensities_cell is a labeled and decoded IntensityTable that has filtered out any spots with cell_ID equal to 0

def mergeNuclei(mergeList, cellDataArray, numCells: int):
    for removeID, mergeWithID in mergeList:
        assert removeID < mergeWithID, "Error: The first cell ID must be smaller than the second in each double in mergeList. Swap " + str(removeID) + " with " + str(mergeWithID) + "."
        assert removeID < numCells+1, "Error: The cell ID " + str(removeID) + " is larger than the number of cells " + str(numCells) + "."
        assert mergeWithID < numCells+1, "Error: The cell ID " + str(mergeWithID) + " is larger than the number of cells " + str(numCells) + "."
    cellList = cellDataArray['cell_id'].data
    indexTracker = [] # this object will keep track of the correct cell ID numbers
    for i in range(1,numCells+1):
        indexTracker.append(i)
    indexTracker = dict(enumerate(indexTracker, 1)) # fill with original cell ID numbers
    for removeID, mergeWithID in mergeList: # loop through all merge combinations
        removeID = indexTracker[removeID] # update to current cell ID number
        mergeWithID = indexTracker[mergeWithID] # update this too
        indicesCache = []
        indexAppears = False # this is a failsafe to make sure removeID exists in cellList
        for i in range(0,len(cellList)):
            if cellList[i] == removeID:
                indexAppears = True
                cellDataArray['cell_id'].data[i] = mergeWithID # set the undesired cell ID to the merged cell ID
            if cellList[i] > removeID:
                indicesCache.append(i)
        if indexAppears: # a cell ID is now gone, so now update all cell IDs above it to take its place
            for i in indicesCache:
                cellDataArray['cell_id'].data[i] = cellDataArray['cell_id'].data[i] - 1
            for i in range(1,numCells+1):
                if i > removeID: 
                    indexTracker[i] = indexTracker[i]-1 # update cell IDs internally as well
    return cellDataArray

My next big task is to optimize Segmentation.Watershed in my pipeline. I'd appreciate if you can take a look back at the question I bolded in the last comment, as I need your advice to figure out how to best prevent over-segmentation from occurring.

sethb3744 commented 5 years ago

Edit (March 1): I resolved the below issue after I familiarized myself more with Segmentation.Watershed. The oversegmentation issue I was having is now fixed. I just had to add the h_minima function to watershed.py in the run function. If you want to see how I did it, I pasted the code below with hashtags next to the parts I added/modified:

from skimage.morphology import h_minima #####

def modified_run(self, primary_images: ImageStack, nuclei: ImageStack) -> np.ndarray:
        # create a 'stain' for segmentation
        mp = primary_images.max_proj(Axes.CH, Axes.ZPLANE)
        mp_numpy = mp._squeezed_numpy(Axes.CH, Axes.ZPLANE)
        stain = np.mean(mp_numpy, axis=0)
        stain = stain / stain.max()

        # TODO make these parameterizable or determine whether they are useful or not
        size_lim = (10, 10000)
        disk_size_markers = None
        disk_size_mask = None

        nuclei_mp = nuclei.max_proj(Axes.ROUND, Axes.CH, Axes.ZPLANE)
        nuclei__mp_numpy = nuclei_mp._squeezed_numpy(Axes.ROUND, Axes.CH, Axes.ZPLANE)
        sig_minima = h_minima(nuclei__mp_numpy, self.minima_lim) #####
        all_minima = h_minima(nuclei__mp_numpy, 0) #####
        sig_nuclei__mp_numpy = nuclei__mp_numpy - (nuclei__mp_numpy * (all_minima - sig_minima)) #####
        self._segmentation_instance = _WatershedSegmenter(sig_nuclei__mp_numpy, stain) #####
        label_image = self._segmentation_instance.segment(
            self.nuclei_threshold, self.input_threshold, self.size_lim, disk_size_markers,
            disk_size_mask, self.min_distance
        )
        self._num_cells = self.get_cell_num()

        return label_image

Since my last comment, I searched on methods to fix over-segmentation. The issue was slightly helped by improving prefiltering with a Guassian blur, but I am currently looking into a method I found on this website. It says that by removing local minima that don't meet a certain depth, it prevents insignificant local minima from confusing Watershed and over-segmenting cells. To do this, I want to add the skimage.morphology.h_minima function, which determines the local minima. I then want to remove the insignificant local minima below a certain depth just before performing watershed. However, I do not know enough about the current setup of starfish/watershed.py to determine how to implement the function. Can you or somebody else help me to implement this method? It is possible that the program is already doing something similar to this and I'm just not aware. I will also do more probing to familiarize myself with watershed.py in the meantime. Thank you for all your help.

ambrosejcarr commented 5 years ago

@sethb3744 This is awesome! We'll see if I we can incorporate your solution and improve our watershed. Thank you very much for adding your research to the issue!

cc @kne42

sethb3744 commented 5 years ago

I'm glad I could help! You should also know that self.minima_lim was an int attribute I added to the Watershed class and that it is generally somewhere between 0.01-0.00001. I have it set at 0.006 right now, but I still need to test it out further to determine if there is an optimal value for it or if it depends on the case.

I've made quite a few changes in the code and Jupyter-notebook pipeline to add use cases for the project I'm working on. I would be happy to share everything with you and explain what additions or changes I made when the code is finalized.

ambrosejcarr commented 5 years ago

That would be great! I'd be happy to hop on a video call to walk you through submitting a PR against starfish and hear a little bit about the project you're working on, if either would be of interest to you.

sethb3744 commented 5 years ago

Ambrose, when we talked yesterday, I mentioned that I wanted to perform Segmentation with primary images. You told me to format the primary images to include multiple channels. However, I also wanted to assign nuclei from the DAPI image (which only has one channel) to one channel of the primary images (which has multiple channels). I want to do this because the primary images will have fluorescent cell type markers, and I want to know which cell type each nucleus belongs to.

Is there a way to do that in starfish currently? For example, if a nucleus is given a certain cell ID during Target Assignment, is it possible to say that the nucleus with cell ID 2 belongs to a particular channel? It doesn't matter what exact cell in that channel it belongs to; I just want to know which channel the nucleus is in.

ambrosejcarr commented 5 years ago

Hi @sethb3744 We intend to overhaul segmentation in the near future to enable users to do some of the things you've requested. We'd like to make it possible to separate the creation of the image and mask from the actual watershed segmentation by adding a few more filters to starfish (morphological operations, thresholding) and allowing users to combine multiple ImageStacks, for example, to create a maximum projection of the spots and the dapi images to form a composite that contains both spots and nuclei. I'll link this issue to the segmentation improvements epic to track your work to address over-segmentation.

Regarding the display of named objects, we have an open PR to extend napari which implements these features! It doesn't plot the number, but it does show the label next to the mask, and this label corresponds to the label of the mask in our SegmentationMaskCollection

image You need to select the mask layer and hover over the cell of interest. You'll notice that I'm currently hovering over cell 14.

I'm going to list this issue as being closed by that PR, as I think it addresses your concerns. Please feel free to re-open it if you feel otherwise!