Define expected features from Perception Modules

lichtefeld commented 3 years ago

Below is a list of ideal input features to ADAM's internal representation, the final representation may not include all of these.

In addition we want some way of being able to translate from our bundle of features back to a location in the image of the scene so that visualization show an alignment between features and the scene.

Per Object Segmentation in Scene

Stroke graph
Color (Patterns with colors like stripes may be interesting but not needed early)
Texture / Material
Rough Shape (e.g. spherical, etc)
Sub-Object Parts

Object-subparts can have the same list of features as the top level object additional some of the between object features may be relevant such as 'distance' for placement of object parts.

Between Object Features

Distance between two objects relative to the frame of reference of the observer
In Contact

Action Features

Temporal pace, e.g. number of frame or time between start and end position.
Traversal path of objects between the start and end frames

ISI will:

Create new types of perception graph nodes in our perception graph to account for new types of visual features listed above.

lichtefeld commented 3 years ago

In looking to implement these further considerations:

For color do we want to work with categorical space where the perception component just gives a color or work with the continuous space of RGB values and deal with the 'similarity' problem within the graph structure? - My preference is categorical because that matches P1/2 and as attributes are not explicitly a target of learned language categorical simplifies the matching process to "do the categories match" rather than us dealing with "is this new RGB value close enough to previous ones".

Texture / Material & Rough Shape also both seem like categorical features.

Distances between objects will become continuous -- Contact is a boolean.

Temporal pace is continuous --This should probably strictly be time as frame rate is camera dependent. The traversal path is probably a mix of some continuous and some boolean descriptions?

lichtefeld commented 3 years ago

@blakeharrison-ai discussed "Optical Flow" being a potential feature to pass through for between two still images in the flip-book view to help capture motion. We'd need to determine how to pass and deal with this feature within ADAM's graph representation.

lichtefeld commented 3 years ago

@shengcheng -- As I believe you're the one who's going to be responsible for generating the output of Perception would you mind proposing an output format you believe is reasonable for the information requested above?

My preference is either a YAML or JSON format (YAML if we'd prefer the output to be more human-readable) but the exact layout of the keys I think should come from you as you'll be generating these outputs. If you have any more questions for @denizbeser or myself before proposing a format feel free to ask here so we can track the discussion!

isi-vista / adam