isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
10 stars 4 forks source link

Color segmentation experiment #1170

Open spigo900 opened 2 years ago

spigo900 commented 2 years ago

The goal of this experiment is to test color segmentation as a way of refining our instance segmentation masks. This is different from #1169 in that the focus here is extracting (noisy) internal details about the object's shape, as opposed to accurately recognizing the external shape.

This issue outlines two things. First, the experiment design including the variants of color segmentation. Second, the necessary technical changes to support this experiment.

Experiment layout

Color segmentation

We will compare GNN and ADAM accuracy under two conditions:

  1. Baseline, no color segmentation.
  2. Parallel. Color segmentation where we extract strokes from a color-refined segmentation image.
    1. In this variant, we perform instance segmentation and color segmentation separately on the same image.
    2. Then we create a combined mask where we simply pick out the region of the color-segmentation image corresponding to the object mask. All pixels outside the object mask are set to black.
    3. We recolor the color-refined segmentation mask to increase contrast.
    4. We extract strokes from the resulting recolored and combined mask.
    5. GNN training and ADAM training happen as usual.
  3. (Optional variant) Objects first. Color segmentation where we segment colors within the object-segmentation-masked image, then extract strokes from the results.
    1. In this variant, we perform instance segmentation first.
    2. Then using the instance segmentation, we mask the original RGB image to show just one object -- all pixels outside the object mask are set to black.
    3. We perform color segmentation on this masked image.
    4. We then recolor the color segmentation over the masked image.
    5. We then perform stroke extraction over the recolored and refined segmentation mask.
    6. GNN training and ADAM training happen as usual.

Curriculum

Because it's also an objects experiment, this experiment would use similar curricula to #1169. That is: A set of the 400+ Unreal 4 images from the Month 5 curriculum, as well as a set of the 400+ Unreal 5 images, and a final curriculum containing both sets.

We could also look at a curriculum focusing on a subset of objects where we would expect color refinement to help. However, if we expect color to pick up on internal edges between "faces" of the object where only the lighting differs, this could include most objects. The only cases where it seems like it definitely shouldn't help are balls, oranges, paper, and sphere-shaped blocks. However I expect color segmentation might help with these objects indirectly by distinguishing the objects more from some of their distractors. Overall I think it probably makes sense to simply use the objects curriculum as-is.

Instance segmentation

To minimize noise from instance segmentation, ideally we would use a segmentation that is known to follow the object shape closely without including too much outside of each object. The legacy AirSim instance segmentations would be ideal for this, and we have those for the M5 objects curriculum. However, per #1169, we will not have such segmentations for the new Unreal 5 objects curriculum.

Given this, we could choose either the base Mask R-CNN, or STEGO for color segmentation experiments. I'd like to go with STEGO, because I expect STEGO will give "tighter" segmentation output. This is a bit of a pain since STEGO has to run on the cluster while I have to run stroke extraction locally. But for the variants I have planned, we only need to run object segmentation once, so this is probably fine. We discussed this here and came to more or less the same conclusion.

GNN training

I plan to use our existing GNN script to train the models. I plan on using running these experiments using just one set of hyperparameters.

Technical changes

Color segmentation

We need a script to perform color segmentation. I plan to integrate code from the paper "Robust Image Segmentation Using Contour-guided Color Palettes" (Xiang Fu, Chien-Yi Wang, Chen Chen, Changhu Wang, and C.-C. Jay Kuo at CVPR 2015). The paper's code lives here: fuxiang87/MCL_CCP. This runs fine in a similar Matlab configuration to Sheng Cheng's GNN code, so I plan to use the same environment for both. I plan to clone their repo into the ADAM preprocessing folder as a git subtree.

This will require a few changes. I'd have to edit their Matlab script slightly. We need to parameterize their script so we can run it on arbitrary input images. This looks simple enough. It will also require writing a Python script to call the modified Matlab code -- also simple enough. Overall, the changes here look straightforward.

Mask intersection

To implement this we need code to load the object mask and the color segmentation image and use the object mask to "mask" the color segmentation image. That seems straightforward enough but worth mentioning.

For ease of use, it probably makes sense to do this in the same script as color segmentation. This could be a flag --clip_by object_mask with companion --clip_to_file. (--clip_by_color would be reasonable but it's not needed I think until/unless we need to worry about multiple objects per image. For more on that, see the discussion of stroke extraction.)

Stroke extraction

We may need to make some changes so that the refined masks are handled properly during stroke extraction. One issue is that similar color regions may blend together in terms of lightness, leading to less effective stroke extraction. The second issue is that this complicates multi-object scene segmentation. If we need to deal with the second issue directly, I think we can. However, we may not need to deal with the second issue.

The first issue is that color regions may blend together. The Matlab code for stroke extraction starts by converting the image to grayscale. This works by throwing away the hue and saturation values for each pixel. The Matlab code relies on doing Canny edge detection over this grayscale image. So if we have color regions that are similar in lightness we are likely to run into problems. I'd want to rewrite the colors in the refined-segmentation image we feed to the stroke extraction code to make sure that they are distinctive in lightness.

The second issue is, this complicates multiple-object scene segmentation. In the current, baseline setup, every mask should correspond to an object if the masking model is accurate. So for spatial relations, we'd like to simply switch off the logic that glues together all stroke graphs in the image. But things become more complicated if we want to integrate spatial relations with color segmentation for refining strokes. In that case we run into problems because we don't want each "color slice" of an object to be treated as a separate object, and we don't want different objects sharing a color along their boundary to bleed together and get treated as one.

This second issue is not an immediate concern, but it means extra work if we want to do both color segmentation and spatial relations at the same time. For spatial relations we'd need to prevent the gluing-together of different object. However, we'd still want to refine each object mask separately. One approach to resolving this would be to produce k refined segmentation masks, one per object, then feed each image separately into stroke extraction, then finally combine the results into one feature file. We'd still want to run just one script per situation, so this would require some changes to the stroke extraction script. This integration is complicated.

But if we don't need to integrate them, I think we can get away with a simpler solution. A "good enough" solution for now might be: Ignore the problem, because we don't need to do both at once for our experiments. If I need to make any changes to stroke extraction for color segmentation, I can just create a separate copy of the script. I don't think we need to do both at once, so this is probably fine.