pp["DAPI"] is dependent on obs

sagar87 / spatialproteomics

Spatialproteomics is a light weight wrapper around xarray with the intention to facilitate the data exploration and analysis of highly multiplexed immunohistochemistry data. Docs available here: https://sagar87.github.io/spatialproteomics/ .

https://sagar87.github.io/spatialproteomics/

MIT License

13 stars 0 forks source link

pp["DAPI"] is dependent on obs #39

Closed MeyerBender closed 5 months ago

MeyerBender commented 5 months ago

In the pp.filter_by_obs method, it looks like the centroids get lost, which causes downstream issues when trying to select by a marker. Needs further investigation to find the root cause (pp["DAPI"] should not be dependent on obs in the first place) and a proper fix.

# does not work
img_resegmented.pp.add_observations('area').pp.filter_by_obs('area', func=lambda x: (x > 50) & (x < 300)).pp['DAPI']
# works
img_resegmented.pp.add_observations('area').pp.filter_by_obs('area', func=lambda x: (x > 50) & (x < 300)).pp.add_observations().pp['DAPI']

sagar87 commented 5 months ago

Smells like some evil inplace operations....

sagar87 commented 5 months ago

Hold on, that works:

2024-05-28 at 14 00

MeyerBender commented 5 months ago

But it doesn't if you remove the centroids, correct? The issue is in the pp.get_bbox(x_slice, y_slice), I think because without centroids cells can't be selected by the FOV. Seems like a bug in __getitem__ to me.

MeyerBender commented 5 months ago

Scratch that, the issue is in get_bbox, where Dims.CELLS in self._obj.sizes is true and triggers a subsetting of cells, which is impossible. Can probably fix by checking if there are centroids present, and if not, we simply do not subset the cells. What do you think?

sagar87 commented 5 months ago

Yes you need the centroids so that __getitem__ can select the right cells. In which scenario are the centroids not present ? We enforce pp.add_observations() upon adding a segmentation.

MeyerBender commented 5 months ago

Running img_resegmented.pp.add_observations('area') only adds the area, but not the centroids. I can add them to always be added, something like:

 if "centroid" not in properties:
        properties = ["centroid", *properties]

I have created a new branch that already implements a fix for the previous issue, should I simply add this to pp.add_observations() or do you see any way this could backfire? We already have the same procedure for forcing label when computing the obs.

MeyerBender commented 5 months ago

Actually there's a simpler fix, the obs simply did not get computed in the cellpose method, just implementing that should fix it.

sagar87 commented 5 months ago

But pp.add_segmentation() calls pp.add_observations(), hence _obs with centroid should be in the object. The only way this cannot happen is if you manually add a _segmentation.

sagar87 commented 5 months ago

Yes I am working on this now, don't touch it.

MeyerBender commented 5 months ago

One thing to keep in mind is that with cellpose we can have either a segmentation on one or multiple channels. If we only have one channel it's fine, but with multiple channels we would need to specify which of the channels takes priority (could just do the first one but that would be more of a temporary fix). Basically we can't sync obs with a multidimensional segmentation mask.

sagar87 commented 5 months ago

The workflow should be as follows. If only nuclear channel

(sprot
  .tl.cellpose(channel='DAPI', key_added='_segmentation_dapi') # returns a (X, Y) tensor
  .pp.add_segmentation('_segmentation_dapi')
)

if segmentation has multiple channels, we need to work with an intermediate object

intermediate_object = (sprot
  .pp[['DAPI', 'CD66', 'CD11c']]
  .tl.cellpose(key_added='_segmentation_multi') # returns a (C, X, Y) tensor
  .pp.merge_segmentation(layer_key='_segmentation_multi', key_added='_segmentation_merged') # returns a (X, Y) tensor  
  .pp.add_segmentation('_segmentation_merged')
)

so basically pp.merge_segmentation is a function from (C,X,Y) => (X,Y). Then one take the merged segmentation and use it for the original object

final_obj = sprot.pp.add_segmentation(intermediate_object._segmentation.values)

MeyerBender commented 5 months ago

Honestly I'm not the biggest fan of having to call pp.add_segmentation() after segmenting, seems a bit redundant, especially since you already have key_added as a parameter. But it is probably the most explicit solution, so I guess it'll get the job done. Can you also implement this in stardist and add it to the docs then?

sagar87 commented 5 months ago

You shall not assume what the user wants. Paternalism is no good.

MeyerBender commented 5 months ago

Addressed in #37. Closing issue.