Some features of an object/action may vary within some range. Apples can be red or green, for example. Bananas can be yellow, but apparently they can also be red. ETA: On the other hand many objects don't have a specific associated color, for example a rubber ball could be pretty much any color. We would like to be able to generalize across this kind of variation.

Our planned approach to colors however (#1128) doesn't meet this challenge, however far we can advance it. Most likely we will end up learning a distribution that covers many colors we know an apple can't be and hasn't been observed to be, and may end up accidentally excluding colors we've actually observed as being "outliers" when the distribution is stretched thin across say red and green. It might be possible to address these issues within continuous matching (see PS below), but I don't see an easy way of doing that.

@lichtefeld and I discussed a related topic on 05/17. We discussed backfilling and tangentially I think also this area of improving matching. I don't recall the details however and my notes don't cover them. At a high level we discussed tracking the range or set of things we've seen. For matching purposes, at some point we want to form a disjunction over those things: "We can match this feature value OR that feature value." This may be annoying in terms of the technical details of how to implement this matching. Then, if the disjunction gets too big (how big? this is the hard part 🙂), then we prune it away entirely. ETA: Translation, we relax the graph to remove the too-big disjunction.

For categoricals we can easily track "the set of all values we've ever seen". This is fine, assuming we know how to do the nontrivial high-level things.

For continuous values tracking the set of observed values directly is probably not what we want. I'm not sure what we do want. There are a few issues here. Aside from the "how big" problem, we need to (1) determine the number of subpopulations (say distinct colors), or determine when a new observation warrants a new subpopulation. Then (2) given a new observation, assuming we have formed a disjunction of several different distributions, determine which one to update (maybe update the one with the highest match score?). Also (3) as a technical problem, if we have special disjunction nodes, how the heck do those work, given the matching algorithm needs to match them to something and also needs to match the nodes feeding into it to something? Or do we compact them all into one super-node? The super-node seems tempting, but may be more complicated than I think. Overall the technical issues seem complicated, so I'm going to avoid them for the moment in the interest of actually posting this issue. 🙃 Anyway, I think (1) and (3) are the hard parts here.

@lichtefeld Do you recall any more details that you want to add?

PS: Addressing variation in continuous features different through mixture models

It might be possible to address different candidate colors in color/continuous feature matching's own terms by learning to represent those feature distributions using a mixture of Gaussians. However, this seems complicated enough that I think it is out of scope. This would be combining several things where I think we're all ignorant of the proofs/practical details (a generalized online algorithm, together with mixture models). As far as an algorithm, in a quick search I found this (Declercq & Piater 2008, Online Learning of Gaussian Mixture Models - a Two-Level Approach, at VISAPP 2008) which might be relevant, though I haven't even opened it. Overall this seems like a rabbit hole I'm reluctant to get into.

isi-vista / adam

Generalizing and relaxing patterns to handle multiple acceptable values #1136

PS: Addressing variation in continuous features different through mixture models