Enable pre-indexed superpixels as input for object classification

pdd2110 commented 3 years ago

Description

When using a workflow where the user inputs their own segmentation image, (e.g. object classification), if the objects in the segmentation mask pre-indexed (i.e. have intensities provided), the workflow should use these as the object IDs rather than re-indexing the objects.

Scenario

We've a utilized another program to segment, label, and extract data on objects. These objects and their corresponding data are indexed according to the object IDs (intensity values) from the other program. We want to use Ilastik to classify these objects, and are using the labeled segmentation images as inputs in the object classification workflow. By default, Ilastik re-indexes these objects using a different labeling algorithm (diamond vs box), which leads to object splitting, shape changes, and indexing issues in the Ilastik output data. Currently we're getting around the indexing issues by uploading the segmentation as an atlas image, but we don't have a workaround for the shape changes.

This is also relevant for time-course experiments. For instance, let's say we are analyzing time course images, but only want to compare the initial and final frames to save on processing. Under the current build, if the objects have moved between the initial and final frames, Ilastik's default re-indexing could change the object identities, leading to incorrect class assignment. Having the option to use the input's intensity values as object identities would bypass this issue.

Rationale

By default, Ilastk relabels the objects based on a connected components labeling algorithm (diamond), which splits diagonally-connected superpixels into separate objects. This can lead to identity and indexing issues if, say, the user labeled the objects using a different algorithm (e.g. box).

Scope

As above. Enabling this feature would help to integrate Ilastik into other workflows and prevent indexing/identity issues from relabeling superpixels.

Possible implementation

Should be pretty easy, just use the intensity values from the user input if it is not a binary mask. In addition, allowing the user to choose the connected components labeling function given a binary mask could bypass the splitting issue if they are using Ilastik for segmentation.

-->

k-dominik commented 3 years ago

Hey @pdd2110,

you can achieve something fairly similar right now already, by providing this image as an atlas image in data selection. With that, the result table will include those values from the atlas image.

Cheers Dominik

pdd2110 commented 3 years ago

Hi Dominik,

Thanks for the quick response and suggestion. We're currently using that approach, but as I mentioned its only a partial workaround.

The main problem is that Ilastik's labeling algorithm changes the shape of the superpixels, which leads to different measurements, impacting classification. Basically, Ilastik's default connected components labeling method (diamond) frequently splits objects from our mask into multiple smaller parts. I've provided a max projection image below to illustrate the problem. The first panel is the mask, the second is the 'box' method, and the third is the 'diamond' (Ilastik's default). As you can see, the objects on the bottom and right of the image are split into multiple smaller objects using the 'diamond' method.

These examples show pretty minor changes, with one voxel here or there being split off from the main object. But pretty frequently the object becomes completely fragmented. I'd say somewhere around 85%-90% of the objects are impacted, with 10%-15% having large chunks split off, which we end up filtering out. In other words, we're losing data on about 10%-15% of our objects.

This should be a fairly easy fix:

Add a radio button so the user can choose to label objects using either box or diamond. Set diamond as default so it doesn't mess with pre-existing classifiers.
If the user uploads a mask where the objects are already labeled, just use that by default instead of re-running the connected components algorithm.

ConnectedComponentsComparison

k-dominik commented 3 years ago

Hey @pdd2110,

thank you very much for that detailed comment! I hadn't thought about neighborhoods there... Exposing the neighborhood as a parameter in thresholding should not be a problem. Properly supporting pre-indexed label images for feature extraction is also doable of course, but not an easy fix. We use vigra to compute the object features. It will generate pseudo entries in case some label number is not present - so the assumption there is that label ids are continuous and starting from 1 (with 0 as background).

pdd2110 commented 3 years ago

Ah yeah, the indexing would be an issue if it’s not continuous... good point!

There’s a relatively easy workaround here where you just add one step to the process. If the user inputs their own segmentation image (ie anything non-binary), then have Ilastik automatically create a temp image by reindexing (but not relabeling) the user-inputted IDs to fill any gaps. That temp image would have a 1-to-1 correspondence with the original IDs, and matches all of Vigra’s criteria, so just run that image through the normal pipeline. Once the analysis is done, if the user inputted a segmentation image, just reverse that reindexing function so they have the same IDs in the final outcome as in the user segmentation image.

The reindexing is pretty easy computationally, so running it automatically would ensure there aren’t any problems, even if the initial and temp IDs are the same.

I know the commands for this in imageJ using CLIJ, but I’m guessing you probably have faster methods.

Sent from iPhone

On Apr 23, 2021, at 10:26, Dominik Kutra @.***> wrote:

Hey @pdd2110https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pdd2110&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=QIb03IehesAvgrQsNb5W_DqTcLUW75lvLg77-OBaeD4&m=QX_De5234Aj8bR71_rb-BkNrtIX2qARB_Xy80k5vCSc&s=qFxvKVQKLKp3OA6t7qDrVFc9SClA24Ji0Sl1tjP6RI0&e=,

thank you very much for that detailed comment! I hadn't thought about neighborhoods there... Exposing the neighborhood as a parameter in thresholding should not be a problem. Properly supporting pre-indexed label images for feature extraction is also doable of course, but not an easy fix. We use vigra to compute the object features. It will generate pseudo entries in case some label number is not present - so the assumption there is that label ids are continuous and starting from 1 (with 0 as background).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ilastik_ilastik_issues_2416-23issuecomment-2D825696362&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=QIb03IehesAvgrQsNb5W_DqTcLUW75lvLg77-OBaeD4&m=QX_De5234Aj8bR71_rb-BkNrtIX2qARB_Xy80k5vCSc&s=79DFlgHr2qEEwFGqqbSn_8whPXaiMVITKsjtP5Mjaso&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APDMHTCB44YQ3WYJYGURMP3TKF7RFANCNFSM42RJD6WQ&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=QIb03IehesAvgrQsNb5W_DqTcLUW75lvLg77-OBaeD4&m=QX_De5234Aj8bR71_rb-BkNrtIX2qARB_Xy80k5vCSc&s=v8uQ-yiPs8fe1rn39eP1etXXHGnLs3PTXvgUPQbvTLw&e=.

pdd2110 commented 1 year ago

Just reviving this issue since I haven't heard any updates.

Vigra's connected components function, vigra::labelMultiArray (which I assume you're using) allows you to specify connectivity via the neighborhood argument. Here's the explanation:

Argument neighborhood specifies the type of connectivity used. It can take the values DirectNeighborhood (which corresponds to 4-neighborhood in 2D and 6-neighborhood in 3D, default) or IndirectNeighborhood (which corresponds to 8-neighborhood in 2D and 26-neighborhood in 3D).

Adding this as a radio button based option when the user inputs their own segmentation file would solve much of this issue and make ilastik a bit less rigid ;)

oanegros commented 9 months ago

Bumping this issue as there hasn't been an update in a while, could the neighborhood in given segmentation masks be changed to eight? It doesn't even have to be a UI option imo, as user-supplied masks should be trusted. And reindexing to initial-mask numbers by maintaining some dictionary or just looking up pixel values in the segmentation mask on output should be fairly easy, and would avoid multiple other groups all having to write their own solutions for this and increase adoptability.

k-dominik commented 8 months ago

It would be cool to fix this. As far as I can see after implementing a hack (also after discussing this issue with @oanegros) the solution is probably to add a new workflow flavor. There we could accurately show the segmentation/label mask in the appropriate layer (so either binary, or random colors) and only do consecutive relabeling (for labels) as opposed to cca (for binary). We even could use the "atlas" logic with the original mask to restore original ids for the table.

An alternative idea was to make the user specify the type of input, e.g. in the Object Extraction applet but this would be a departure from our common interactions, where input data is supposed to be fixed after specifying it. Also one would have to explain the cca bit, in order to get into details with connectedness. Whereas a new workflow flavor would shield users.

imagesc-bot commented 8 months ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ilastik-feature-table-export-different-to-object-predictions/92569/5

oanegros commented 8 months ago

currently Object Classification workflows are split up by input type, so adding post-workflow choice ambiguity of data type would be inconsistent (or everything would have to be unified in the input window under a single Object Classification workflow)

ilastik / ilastik