isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
10 stars 4 forks source link

Prevent merging of touching objects #1205

Closed spigo900 closed 2 years ago

spigo900 commented 2 years ago

Because of how our stroke extraction process works (by edge detection), touching objects in an image are currently ~guaranteed to have their strokes mingle and get treated as one mega-object. That means we cannot successfully recognize touching relationships (a specific type of the spatial relation features we are currently investigating) without addressing this problem.

To fix that, one simple hack is to separate out each unique mask in the segmentation image and treat it as its own input to the stroke extraction code (feed into Matlab and process results). We then concatenate the outputs for each such unique mask. This ensures that objects that are distinct in the segmentation image result in distinct stroke extraction objects. It also runs into some problems, namely:

  1. STEGO has a bunch of weird artifacts that will also get recognized as objects. So, for example, we'll end up with an object for "that triangle-shaped hallucination in the corner of the background" and "the shadow of the ball we're looking at." This is probably OK for the spatial relations experiment, but may have weird effects for other things.
  2. Any time one object's mask surrounds another's, we are going to end up with two copies of that object's strokes, so probably two copies of that object, which is probably going to have weird results. At the moment the main problem is the background, which we may be able to hack out in some cases (via #1204), although this hack relies heavily on the special case no-background-noise nature of our curricula.
  3. Probably something else.

This should be a flag because such behavior is not always appropriate. It's not appropriate when training the objects GNN with e.g. STEGO, which may produce multiple masks for just one object. It is probably counterproductive to train the GNN to classify "each weird shape STEGO recognized in the surface of a ball" as being "ball". For similar reasons it's also counterproductive to separate on each mask when using the color-refined segmentation, unless we retain the original instance segmentation and use that for separating out parts of the combined mask image. But that's complicated.

boyleconnor commented 2 years ago

I've been going through the semantic_0.png image files and manually adding a mask of a contrasting color to distinct touching objects. I've attached the dataset, with the fixed files, to this comment.

multi_obj_stego_with_fixes.zip

For posterity:

I created this dataset by combining several images from the internet in Google Drawings, then downloading the folder. The images were first automatically converted to JPEG by the Google Drive downloader, then I used a Python script to convert those JPEG's back to PNG's of the correct dimensions. I then used GIMP (specifically, the intelligent scissors tool) to edit the masks in the semantic_0.png files, and export the result to the same directory, naming the new file semantic_0_fixed.png.

^This dataset is what will be the input to the preprocessing system, so I thought this comment thread would be a good place to story a copy.

spigo900 commented 2 years ago

So to implement this fix there are a few things in our way:

  1. We need to add new parameters to Stroke_Extraction.
    1. We need to add a new boolean flag process_masks_independently that controls the logic described in this issue.
    2. We need to add a new optional independent_masks_dir parameter to Stroke_Extraction. This is where the intermediate "single, separated-out mask" images should be saved. This is required (asserted?) when and only when process_masks_independently is passed.
    3. These new parameters require fixing the outer script, and also singlefile_stroke_extraction The API image tester is a nice-to-fix but we don't particularly need it.
  2. We need to manually separate out each unique color and process that mask separately. That means doing some light processing on the segmentation image:
    1. This should only happen when the flag is enabled, of course.
    2. We want to special-case black (0, 0, 0) because we are using that color for the background.
    3. For each unique color, do something like Snippet 1 below to create a single-mask segmentation file.
    4. It should be easy to do this using Pillow, and nearly as easy with cv2 (see Snippet 1).
    5. It might be good to implement this as a standalone function or a separate method.
  3. Most of the existing code proceeds assuming it is operating one file, which means we have to make some interface changes and move things around.
    1. Two methods need a segmentation_path argument added, in particular stroke_extraction_from_matlab(), and upstream of that remove_strokes().
    2. get_strokes() needs to be refactored.
      1. First, it assumes it's operating directly on the segmentation image at self.path. The method calls at the top all assume that. It might make sense to refactor those out first: Create a new method, say get_strokes_for_file(segmentation_path), and move those top 3 lines of method calls into that one.
      2. I'd like to also have this new method return the data that get_strokes() processes. That makes the serialization logic cleaner, I think. That is, return a dataclass ExtractionResult containing num_objs, reduced_strokes, stroke_obj_ids, adj, and colors. Then when we are in one-file-per-mask mode, we just have to glue together the single-file results in get_strokes().
      3. Second, it does all this processing assuming there is only one file worth of data to extract. It needs to be updated to handle merging together the results.
      4. Third, it does all this processing and saving logic. It may be useful to factor that out as say a new method or methods so that the processing and saving logic doesn't get in the way of the new merging logic.
      5. I'm assuming here that it is relatively easy to concatenate together all the single-mask-file extraction results before processing them and get reasonable output.

Snippet 1

segmentation_img = cv2.imread(segmentation_img)
unique = np.unique(segmentation_img)
# Don't consider black background color if present
unique_to_process = unique[unique != np.zeros((1, 1, 3))]
# This is an array holding the non-background boolean masks as reconstituted from the image. Shape should be: (n_masks, width, height).
masks = np.all(segmentation_image[None, ...] == unique[:, None, None, :], axis=3)
for i in range(len(masks.shape[0])):
    mask = masks[i, :, :].squeeze(0)
    cv2.imwrite(str(self.independent_masks_dir / f"separated_{os.path.basename(self.path)}_{i}.png"))
boyleconnor commented 2 years ago

Maybe I'm misunderstanding, but would it make more sense to make to put the independent masks for each situation in a sub-directory of the situation directory? E.g. the masks from situation_0/semantic_0.png would go in the directory situation_0/independent_masks as situation_0/independent_masks/mask_0.png, situation_0/independent_masks/mask_1.png, etc.

boyleconnor commented 2 years ago

Also @spigo900 in this line:

cv2.imwrite(str(self.independent_masks_dir / f"separated_{os.path.basename(self.path)}_{i}.png"))

I believe you meant to include the mask image, like so:

cv2.imwrite(str(self.independent_masks_dir / f"separated_{os.path.basename(self.path)}_{i}.png"), mask)

correct?

spigo900 commented 2 years ago

@boyleconnor Putting the independent masks in a subdirectory is also fine yes. I'm inclined not to nest things more than we already do, so I didn't. And yes, the imwrite call should save an image. You'll want to save segmentation_image[mask] rather than mask because mask is just booleans so I'm not sure imwrite knows how to save it.

boyleconnor commented 2 years ago

@spigo900 if you don't want to nest directory's more I would not mind accommodating that, but I'm having a hard time visualizing what the directory structure would look like. Would there just be one directory in the curriculum root that would hold all of the independent masks? Or something else?

spigo900 commented 2 years ago

@boyleconnor I'm talking about putting the independent masks in each situation dir. So there's situation_0/separated_semantic_0_0.png, situation_0/separated_semantic_0_1.png, etc.

boyleconnor commented 2 years ago

@spigo900 what is independent_masks_dir for, then?

spigo900 commented 2 years ago

@boyleconnor It's there because we don't currently pass the situation dir to Stroke_Extraction. We could call it situation_dir but currently the stroke extraction class isn't aware of situation dirs so I'm inclined not to reference them.

boyleconnor commented 2 years ago

@spigo900 ah, I see now. How about I get rid of the features_save_path parameter (from Stroke_Extraction.__init__()) and add a new parameter situation_dir?

spigo900 commented 2 years ago

@boyleconnor I'm inclined to just add another parameter for consistency's sake, but the class already has a lot of parameters, so I'm okay with the solution of replacing features_save_path. For the new parameter name I'd slightly prefer output_dir over situation_dir to make it clearer we will save things there.