pganssle / cim

Chord Identification Method Trainer
https://pganssle.github.io/cim/
Apache License 2.0
11 stars 6 forks source link

Change color weighting algorithm? #35

Open pganssle opened 1 year ago

pganssle commented 1 year ago

Right now the color weighting algorithm looks like this:

  1. Gather all sessions that were at least 25 identifications long (or the target number, whichever is smaller).
  2. Merge all the confusion matrices into one big confusion matrix.
  3. Calculate unweighted coefficients for chord like so:

    $$ci = \left(\Sigma{k} M{i,k}\right) \cdot \left(M{i,i} + ww\cdot\left(\Sigma{k\neq i}M_{i,k}\right) + wm\cdot\left(\Sigma{k\neq i}M_{k,i}\right)\right)$$

    Where:

    • $c_i$ is the coefficient for chord $i$
    • $M_{j,k}$ are the elements of the merged confusion matrix.
    • $w_w$ is the weight given to color $i$ when the correct answer was $i$ but the user chose a different chord.
    • $w_m$ is the weight given to color $i$ when the correct answer was something other than $i$, but the user chose $i$.

    Right now, $w_w$ is 5 and $w_m$ is 1.5, so in a simplified example where you have 3 identifications: red, yellow and blue. You get red and yellow wrong but choose red for blue. Red = 1 + 1.5 = 2.5, Yellow = 1 + 0 = 1, Blue = 0 + 5 = 5, so your vector is $[2.5, 1, 5]$, which normalizes to $[0.294, 0.118, 0.588]$.

  4. Iteratively re-weight these with the following constraints:

    1. For a number of chords N, no coefficient should ever be less than ${}^1/_{10 + N}$; so for the black level there are 4 chords, and thus you will see each chord a minimum of 1 in every 14 chords. For the green level it's 1/15, etc.
    2. The most recent chord is never allowed to fall below ${}^1/_N$, so on the black level, black never falls below 1/4, on green, the green chord is at least 1/5.

    If a chord falls below the minimum, it is set to minimum and all the rest of the coefficients are normalized such $\Sigma_i c_i = 1$.

Note that in step 3 we give higher weight to chords that were confused for another chord, so if the child has trouble distinguishing between green and orange but gets everything else right, they'll hear mostly green and orange. However, when reading through the book, I recall (and annoyingly I cannot find the reference) seeing that at one point they suggest that when a child consistently confuses chord A for chord B, you should emphasize chord A and de-emphasize chord B, to give the child more time to learn chord A, so maybe $w_m$ should be a number less than 1?

That said, I am a bit worried that the most likely sources of confusion will be the most recent chords, and so the child might "lose" chord B if not shown it enough. Maybe there should be a different decay curve for the downweighting, or maybe it should be a "per-session" thing, where you have a whole session that de-emphasizes B and a whole session that de-emphasizes A.

Would be interested to know if there's any research to back up any of these ideas about how to resovle confusion (either in chords or in just remembering similar things).

pganssle commented 22 hours ago

Other things I've noticed after some months with this:

  1. It seems possible (I want to do some simulations to figure it out) that there is some feedback mechnism going on here, where you see certain chords more often and they are re-weighted higher because of more "shots on goal" rather than because they are objectively harder than other chords.
  2. I find that when you have a particularly tough chord, it can get a lot of the probability density, and as a result you can get long strings of the same chord (which are easy using relative pitch). We should probably dynamically downweight chords that you just heard (with the weight getting lower and lower the longer the streak goes on).