isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
11 stars 3 forks source link

Support for more than objects in front-end rendering #1089

Closed lichtefeld closed 2 years ago

lichtefeld commented 2 years ago

As we look to move past the M4 deliverable ADAM will need to generate phrases for actions. As such the information contained as part of an action phrase may be different than that in an object phrase. I think the cleanest approach here is to refactor the MainObject type into a new interface which provides optional fields for information we expect to have. I propose the following (python) definition for this interface. Note I've changed the name to LinguisticOutput to be more clear that this is a generic linguistic output object.

@dataclass(slots=True)
class LinguisticOutput:
    id: int # A unique incremented ID for each scene. This is to distinguish multiple instantiations of the same concept apart (e.g. it's accurate to describe a scene with two blocks as "block", "block" even if this doesn't subscribe to Maximum description)
    text: str # The linguistic output generated
    confidence: float # A value between 0 and 1 inclusive which is the confidence of the text output.
    type: str # The type of linguistic output -- Currently we're targeting 'object' and 'action'
    features: Sequence[str] # A list of properties which describe this 
    sub_objects: Optional[Sequence[LinguisticOutput] # An optional sequence of this class. Validation should assert that type='object' in all sub_objects.
    raw_text: Optional[str] # Optionally the raw text of the concept. Internally action concepts are stored as 'SLOT1 runs' but is instantiated as 'dog runs'. Raw text would provide 
    slot_alignment_to_confidence: Optional[Mapping[str, Mapping[str, float]]] # An optional map where dict['slot_1']['dog'] is a value between 0 and 1 providing the confidence behind this object match. 

While this class could be extended in the future I think this covers all phase 3 needs. This refactor should support: