kirchhausenlab / incasem

Automated Segmentation of cellular substructures in Electron Microscopy
BSD 3-Clause "New" or "Revised" License
16 stars 1 forks source link

Support for regions smaller than (204, 204, 204) #14

Closed athulnair02 closed 11 months ago

athulnair02 commented 11 months ago

In a training data configuration file, if the shape of a training region is smaller than [204, 204, 204], then there are a few errors within the gunpowder nodes within the pipeline before training even begins. These errors pertain to how the pipeline is built/setup.

Some of the errors seen:

However, in the private repository fiborganellesegmentation (fos), there are a few differences with incasem that could explain this behavior. fos does not use the ArrayKey BACKGROUND_MASK for its pipeline at all and it is commented out in multiple sections of incasem.pipeline.training_baseline_with_context.py.

keys = {
        'RAW': gp.ArrayKey('RAW'),
        'RAW_OUTPUT_SIZE': gp.ArrayKey('RAW_OUTPUT_SIZE'),
        'LABELS': gp.ArrayKey('LABELS'),
        'MASK': gp.ArrayKey('MASK'),
        # 'BACKGROUND_MASK': gp.ArrayKey('BACKGROUND_MASK'),
        'METRIC_MASK': gp.ArrayKey('METRIC_MASK'),
        'LOSS_SCALINGS': gp.ArrayKey('LOSS_SCALINGS'),
        'PREDICTIONS': gp.ArrayKey('PREDICTIONS'),
    }
pipelines_with_random_locations = []
for sources_p in sources.pipelines:
    p = (
        sources_p
        # + fos.gunpowder.DeepCopyArrays(
        #     arrays=[keys['LABELS']],
        #     output_arrays=[keys['BACKGROUND_MASK']]
        # )
        # + fos.gunpowder.BinarizeLabels([keys['BACKGROUND_MASK']])

        + fos.gunpowder.SaveBlockPosition(
            keys['RAW'],
            raw_pos
        )
        + fos.gunpowder.RandomLocationBounded(
            # mask=keys['BACKGROUND_MASK'],
            mask=keys['MASK'],
            min_masked=self._reject_min_masked,
            reject_probability=self._reject_probability,
        )

        ...

        + fos.gunpowder.CentralizeRequests()
    )
    pipelines_with_random_locations.append(p)

In add_mask.py in fos, the prepare function exists unlike in incasem. This fails to inform AddMask's upstream provider that it requires the array for LABELS as a dependency.

def prepare(self, request):
    deps = gp.BatchRequest()
    deps[self.reference_array] = request[self.output_array]
    return deps

In both fos and incasem the MergeLabels node in gunpowder, the default dtype is uint32, but the pipeline expects a uint8 so the final change made was to change the default dtype.

def __init__(
        self,
        classes: Dict[gp.ArrayKey, int],
        output_array: gp.ArrayKey,
        dtype: Optional[str] = 'uint8',
        ambiguous_labels: Optional[str] = 'background'):
    ...

The 3 main issues are what prevented the incasem pipeline from processing ROIs smaller than (204, 204, 204) like how the clathrin-coated pits and nuclear pores were trained in fos.

bentaculum commented 11 months ago

Hi @athulnair02, thanks for diving into this and fixing it.

athulnair02 commented 11 months ago

Hi @bentaculum, upon further review, it seems like there is still an issue with BACKGROUND_MASK. I was previously running my tests with a (180, 180, 210) region which works until around 7-11k iterations so I mistakenly assumed some tests worked as soon as the iterations began. Now I have correctly started testing with a (110, 110, 110) region which shows errors immediately before an iteration of training can run. BACKGROUND_MASK has an issue in RandomLocationBounded, but unlike before where I omitted the ArrayKey, I will try finding a solution to keep it. As of now, it seems like MASK and BACKGROUND_MASK are the same upstream of RandomLocationBounded but not downstream.

athulnair02 commented 11 months ago

To conclude the discussion, it seems like AddMask was not the core issue that introduced the problem, but rather the lack of a node in the pipeline to pad the array BACKGROUND_MASK. Below, as the pipeline was originally, you can see that after PadDownstreamOfRandomLocation, the request going upstream to prepare the nodes for the batch has the same region for the arrays LABELS, MASK, and METRIC_MASK. RAW is meant to be different as a result of augmentations, but BACKGOUND_MASK should be the same as the rest but is not.

Before

This presents an issue in RandomLocationBounded as the node is not able to satisfy batch requests related to BACKGROUND_MASK since it is unable to find a location that covers all requested ROIs.

After adding PadDownstreamofRandomLocation for BACKGROUND_MASK, the array is the same as LABELS, MASK, and METRIC_MASK allowing for a location to be found that covers all requested ROIs.

After

Now the pipeline works for regions smaller than (204, 204, 204).

bentaculum commented 11 months ago

Thanks for fixing this, and for the nicely documented PR!