isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
10 stars 4 forks source link

Define expected features from Perception Modules #1022

Open lichtefeld opened 3 years ago

lichtefeld commented 3 years ago

Below is a list of ideal input features to ADAM's internal representation, the final representation may not include all of these.

In addition we want some way of being able to translate from our bundle of features back to a location in the image of the scene so that visualization show an alignment between features and the scene.

Per Object Segmentation in Scene

  1. Stroke graph
  2. Color (Patterns with colors like stripes may be interesting but not needed early)
  3. Texture / Material
  4. Rough Shape (e.g. spherical, etc)
  5. Sub-Object Parts

Object-subparts can have the same list of features as the top level object additional some of the between object features may be relevant such as 'distance' for placement of object parts.

Between Object Features

  1. Distance between two objects relative to the frame of reference of the observer
  2. In Contact

Action Features

  1. Temporal pace, e.g. number of frame or time between start and end position.
  2. Traversal path of objects between the start and end frames

ISI will:

lichtefeld commented 3 years ago

In looking to implement these further considerations:

For color do we want to work with categorical space where the perception component just gives a color or work with the continuous space of RGB values and deal with the 'similarity' problem within the graph structure? - My preference is categorical because that matches P1/2 and as attributes are not explicitly a target of learned language categorical simplifies the matching process to "do the categories match" rather than us dealing with "is this new RGB value close enough to previous ones".

Texture / Material & Rough Shape also both seem like categorical features.

Distances between objects will become continuous -- Contact is a boolean.

Temporal pace is continuous --This should probably strictly be time as frame rate is camera dependent. The traversal path is probably a mix of some continuous and some boolean descriptions?

lichtefeld commented 3 years ago

@blakeharrison-ai discussed "Optical Flow" being a potential feature to pass through for between two still images in the flip-book view to help capture motion. We'd need to determine how to pass and deal with this feature within ADAM's graph representation.

lichtefeld commented 3 years ago

@shengcheng -- As I believe you're the one who's going to be responsible for generating the output of Perception would you mind proposing an output format you believe is reasonable for the information requested above?

My preference is either a YAML or JSON format (YAML if we'd prefer the output to be more human-readable) but the exact layout of the keys I think should come from you as you'll be generating these outputs. If you have any more questions for @denizbeser or myself before proposing a format feel free to ask here so we can track the discussion!