Pursuit Conceputal Learning Vs. Stroke GNN

lichtefeld commented 2 years ago

@marjorief -- I’ve hit a less-than-ideal outcome when using the concept learner’s not-as-robust hueristics when compared to the GNN that Sheng uses. With 10 examples (assuming 1 object per scene) Sheng’s system very quickly gets to 90%+ accuracy while the concept learner struggles to get 5 examples that can generalize to the same concept. Some tuning of the graph match percentage parameter has helped things (as the number of nodes in a sub-object graph is now much smaller) with the following

90% Graph Match = Only Sphere learned (simplest object)
75% Graph Match = Sphere & some cubes
50% Graph Match = Sphere, but now we're calling more objects spheres than we should

Below are two options, integration of a Stroke GNN into the concept learner or some more processing of features to extract additional features to consider. I prefer the route of the Stroke GNN as we know that this system is highly accurate with good data.

Integration a Stroke GNN

The basic idea is that for this idea curriculum where we only have 1 object in the scene and we guarantee it will align to the linguistic utterance then we should train Sheng’s Stroke GNN and take that as an additional feature with a higher weight than the other set of available features (Essentially rely on the stroke recognition when it’s highly accurate). Then we can still display the strokes as features for the differences between objects as that's the underlying features being used to make the determination.

Non-Ideal Curriculum Considerations

If we consider learning from a non-ideal curriculum (e.g. we have N objects in the scene but only one linguistic label) then we can’t reliably train a single Stroke GNN. Our world-view already asserts that a cognitive learner "knows that objects exist in the world" so a potential extension of this assumption is "objects must have a physical shape". So we could have a Stroke GNN that is trained to recognize a minimal set of shape primitives (A similar set to what we previously used in P1/2). Then we’d take this shape primitive information as an additional feature for the concept learner to consider. This addresses some of the heuristic concerns with stroke matching without the GNN. We’d rely on pursuit to centralize on ‘this set of strokes, shape primitive, etc’ aligns to this object then we’d either a) take a memory of the observed strokes or b) after reaching the confidence level begin training the concept itself directly into a high-level stroke GNN that is trying to classify straight to concept name.

Additional Heuristics

Some additional heuristics to consider:

Stroke classification - Attempt to classify strokes as 'straight' or 'curved'. This provides a better metric to match on as the normalized values are not-robust to rotation (e.g. a diamond rather than a square even though these are potentially the same shape)
If strokes are 'straight' we could reverse engineer the angle between two strokes. (It's just some math to determine the angle between two vectors) -- This could improve the rationale for matching 'is_adjacent' edges.

I'd need to gives these heuristics more thought as this also creates a more complicated alignment issue as we're no longer matching discrete categories but rather continuous values where the 'acceptable' range is probably the learned mean & variance from all positive examples.

Thoughts on these proposal?

marjorief commented 2 years ago

one thought is that Joe has seemed happy with a learning one object at a time curriculum. so maybe we make a case for the ideal curriculum.

In your initial message you had been worried that DARPA might think this was coming with too much background knowledge. Can you say more about that?

lichtefeld commented 2 years ago

In your initial message you had been worried that DARPA might think this was coming with too much background knowledge. Can you say more about that?

I was concerned about this for the non-ideal case where we train a Stroke GNN on a small number of rough shape classes (e.g. sphere, cuboid, cylinder, pyramid, ovaloid) and use this additional 'feature' to help distinguish between multiple unknown objects in the scene. Essentially using really broad shape primitives to get continuous values of [0.8, 0.05, 0.1, 0.025, 0.025] for the 5 categories listed above. And use this to do some early matching until pursuit is confident enough in a graph match that we train a different Stroke GNN on exact concept match. E.g. use primitive features to train the concept stroke GNN in an unsupervised way.

However, if we are able to just go with the one object at a time curriculum then we can just train the concept Stroke GNN to begin with.

marjorief commented 2 years ago

my tact would be to discuss this honestly with DARPA during our demo call.

isi-vista / adam