isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
11 stars 3 forks source link

Improved color matching through a change of coordinates #1128

Open spigo900 opened 2 years ago

spigo900 commented 2 years ago

Our current color representation is RGB, which is hard for us to match well. We've considered doing classification, either a more fine-grained manual one (difficult...) or using a model, but neither seems like a good solution for the time we have left. Our current strategy for handling color is simply to do exact RGB color matching. In theory we could switch to distance-based matching. Then we could match if the distance meets an arbitrary or a configurable threshold, or apply our new continuous feature matching strategy to learn a distribution of acceptable deviations in color. However, RGB color distance doesn't obviously correspond well with human perception, which ideally our matching should agree with.

I suspect we can make distance-based matching work well enough by doing a change of coordinates. Color coordinates are a complicated subject, but from a very cursory search, it sounds like the CIELab (or CIELAB, or (CIE) L*a*b, or Lab) color system would be a good coordinate system to try. On its color models page, Wikipedia notes:

Officially, both CIELAB and CIELUV were created for their color difference metrics ∆Eab and ∆Euv, particularly for use defining color tolerances...

There is an easy approximate metric we can use once we've done this change: Just calculate the Euclidean distance between the colors as points in CIELab space (see here). Then we can do distributional, distance-based matching.

Technical details

We want to implement this as a new pattern/graph node pair. The pattern node has a reference point. The pattern node always matches other pattern nodes. The pattern node matches to a graph node only if the graph node is "close enough" to the pattern node's reference point. The pattern node should track a reference point in CIELab space, and when we create a pattern node from a graph node, we just use the graph node's point in CIELab space as the reference point.

When we confirm a CIELab pattern node-pattern node match between nodes say X and Y, the pattern node X (or self) should be updated. We at least want to update the distance distribution with the distance between X's reference point and Y's. For now, we never update this reference point. We fix the reference point in place and only update the distance distribution.

Finally, where we translate YAML into a perception node, we would want to translate the (s)RGB color into a CIELab color and add a CIELab color node. Where we translate perception graphs into patterns, we want to translate the graph node into a pattern node.

Note that on a technical level, we do not want to replace the RGB colors. We want to preserve them so we can display them in the "experiment results viewer" user interface, the Angular web app. The CIELab nodes should be a distinct feature from the current RGB color nodes.

How to translate from (s)RGB into CIELab?

This is the core question that needs more research. I'm not sure of the technical details of how this works, although it sounds like this is a two step process: First, translate from sRGB to CIE XYZ, then translate from CIE XYZ to CIE L*a*b. I don't know the specific formulas. From what I saw they look complicated and they're unfortunately not in the Python standard library (unlike HSV/HSL and a few others -- "YIQ" whatever that is).

Some parameters I think we can specify now:

There are most likely other "hows" that I don't yet know I need to specify. Ask away.

Further work/out of scope for this issue

Updating the reference point

We may want to update the reference point, but that complicates things since it invalidates our distance distribution. We moved our mean an arbitrary distance, which means the distance to each of the previously observed points has also changed. That's a problem. At inference time, our reference point stays fixed, so we don't want to use a distribution for distances where the distances use a moving reference point.

One way to deal with this would be to simply throw out the old distribution. We could do this whenever the distance in one update is too large, but this may cause "boiling the frog" problems with an adversarially designed curriculum, where we do a bunch of updates that move the color from one side of color-space to the other, but none of them forces us to throw out the distance distribution and so we end up with a distance distribution whose variance is too small to capture all colors observed. It's too precise.

A better way might be to track a running mean over all points observed and "start over" whenever the mean and the reference point deviate too much. (New problem: How much is too much? Arbitrary threshold? 🙃) By "starting over" I mean we use the mean as the new reference point, and we start from an "empty" distance distribution. We'd update this running mean similar to how the existing Gaussian matcher updates the mean, except treating the color coordinates as a vector. So the new reference point is (L_old + L_obs/n_including_obs, a_old + a_obs/n_including_obs, b_old + b_obs/n_including_obs) i.e. cielab_old + cielab_obs/n_including_obs (treating the coordinates as vectors).

More sophisticated distance metrics

I quoted the Euclidean metric above, but there are other color distance metrics intended to more exactly approximate "perceptual uniformity" i.e. the condition where numbers that come out correspond in some sense to linear changes in perceived colors. There are several more sophisticated CIELab-based distance metrics we could use (CIE94 and CIEDE2000). We could probably develop this into its own line of work if we were so interested, comparing the different ways of matching color. However, this is a deep rabbit hole, so for now those things are out of scope.

Determining an appropriate illumination condition/"reference white"

I'm arbitrarily fixing the reference white we use to a specific one. A "smart" system might try to pick a reference white based on the whole image, and possibly other frames/camera views. This requires actually looking at the image however, and I'm not sure how much work this is -- that would be more ASU's territory. This might be an interesting thing to do, but it is almost definitely out of scope.

spigo900 commented 2 years ago

@sidharth-sundar Following up on some things from our meeting.

How to run locally

Here's an example using the M5 objects experiment. This experiment is defined using a parameters file. See m5_objects_v0_with_mugs_subset.params. To run this experiment, you would do (from the repository root):

python adam/experiments/log_experiment.py parameters/experiments/p3/m5_objects_v0_with_mugs_subset.params

These params files are YAML files with some minor extensions for variable interpolation and including other YAML/params files. This file is where we implement those extensions, plus some related convenience functions.

Color evaluation experiment

Once this is implemented, we'll want to evaluate how well this is working. At that point, I suggest the following experiment:

  1. Create a new curriculum pair (train and test) from our existing object samples, but restrict it to a subset we should be able to distinguish by color -- say apples vs. oranges vs. bananas. ETA: Paper would probably fit, too.
    1. EDIT: I would suggest creating a new script in scripts/ to create this curriculum pair from the preprocessed data. I mean the data as preprocessed by this script to reorganize the raw data at /nas/gaia/adam/phase3_data/adam_objects_v0_with_mugs.
    2. Replace all the object_concept keys (ETA: in the feature.yaml) with say unknown or apple so that we have no object recognition node.
      1. Goal: Get rid of the info we would normally use to recognize the object so we have to rely purely on color.
    3. You may need to manually filter so that we only have red apples. We want to extend matching to support things like "apples are red OR green" but this is a separate issue (not yet written up).
  2. Run the object learner and measure typewise accuracy.
    1. This will require changes from here which should I hope be merged by the time we're ready for this experiment.
  3. Check the learned patterns and make sure each object pattern includes color as a node.
    1. These should get saved in data/learner/simulated-integrated-learner-params/experiments/$new_train_curriculum_name/hypotheses/final.
spigo900 commented 2 years ago

For clarity, writing up here an idea @sidharth-sundar had: We could also approach color matching by matching using a multivariate Gaussian distribution. I think this is worth trying. That would give us a third thing to compare with our baseline in the experiment outlined above. So we would compare results between (1) baseline, using exact RGB match, (2) CIELAB with simple matching, (3) CIELAB with multivariate matching. Note this experiment requires that we can switch ~easily between (2) and (3).

Because it was so messy to add a continuous value matching threshold to the learners last time, I'd like to avoid creating a second threshold for colors, so I want them to use consistent scales for match scores. I'd also like to keep things 0-1 if possible because the bounded scale seems easier to understand. So, either do something like a multivariate hypothesis test which keeps things on a 0-1 scale consistent with the 1D case, or (if we have to) use Mahalanobis distance and change the previous code to use absolute z-score as the match score.

One difficulty we'd need to solve is extending/replacing Welford's algorithm to handle the multivariate case. I haven't found a great source from a quick search. This might be useful but doesn't discuss Welford's algorithm explicitly and from a quick skim I can't tell if it's implicitly using Welford's algorithm for the one-dimensional special case. So it might be doing something like the naive 1D thing which has poor numerical properties. The author does seem to be aware of Welford's algorithm but who knows if that translates to a stable algorithm in the other post. This paper could be useful though who knows. It probably has an answer, because they explicitly discuss extending Welford's algorithm to "weighted covarinace" which includes unweighted covariance as as special case. But from a very shallow skim, it looks pretty dense. The first 3 pages seem the most likely to yield a useful answer if there's one to be found there.

spigo900 commented 2 years ago

Also, it looks like this gives two generalizations of the CDF to multivariate normal distributions. I think the latter is the one we want, as it's easier to compute with than the "axis-aligned" CDF. That is, this one:

Another way is to define the cdf F ( r ) {\displaystyle F(r)} F(r) as the probability that a sample lies inside the ellipsoid determined by its Mahalanobis distance r {\displaystyle r} r from the Gaussian, a direct generalization of the standard deviation.[11] In order to compute the values of this function, closed analytic formulae exist,[11] as follows.

spigo900 commented 2 years ago

Linking the report Sid put together from here so that we can easily find it again if needed: https://docs.google.com/document/d/1cTtY67oCg_8EX1lfU6srxbjPAPtC1NnpAx5RVXyFRlI/edit