pganssle / cim

Chord Identification Method Trainer
https://pganssle.github.io/cim/
Apache License 2.0
6 stars 3 forks source link

Change color weighting algorithm? #35

Open pganssle opened 9 months ago

pganssle commented 9 months ago

Right now the color weighting algorithm looks like this:

  1. Gather all sessions that were at least 25 identifications long (or the target number, whichever is smaller).
  2. Merge all the confusion matrices into one big confusion matrix.
  3. Calculate unweighted coefficients for chord like so:

    $$ci = \left(\Sigma{k} M{i,k}\right) \cdot \left(M{i,i} + ww\cdot\left(\Sigma{k\neq i}M_{i,k}\right) + wm\cdot\left(\Sigma{k\neq i}M_{k,i}\right)\right)$$

    Where:

    • $c_i$ is the coefficient for chord $i$
    • $M_{j,k}$ are the elements of the merged confusion matrix.
    • $w_w$ is the weight given to color $i$ when the correct answer was $i$ but the user chose a different chord.
    • $w_m$ is the weight given to color $i$ when the correct answer was something other than $i$, but the user chose $i$.

    Right now, $w_w$ is 5 and $w_m$ is 1.5, so in a simplified example where you have 3 identifications: red, yellow and blue. You get red and yellow wrong but choose red for blue. Red = 1 + 1.5 = 2.5, Yellow = 1 + 0 = 1, Blue = 0 + 5 = 5, so your vector is $[2.5, 1, 5]$, which normalizes to $[0.294, 0.118, 0.588]$.

  4. Iteratively re-weight these with the following constraints:

    1. For a number of chords N, no coefficient should ever be less than ${}^1/_{10 + N}$; so for the black level there are 4 chords, and thus you will see each chord a minimum of 1 in every 14 chords. For the green level it's 1/15, etc.
    2. The most recent chord is never allowed to fall below ${}^1/_N$, so on the black level, black never falls below 1/4, on green, the green chord is at least 1/5.

    If a chord falls below the minimum, it is set to minimum and all the rest of the coefficients are normalized such $\Sigma_i c_i = 1$.

Note that in step 3 we give higher weight to chords that were confused for another chord, so if the child has trouble distinguishing between green and orange but gets everything else right, they'll hear mostly green and orange. However, when reading through the book, I recall (and annoyingly I cannot find the reference) seeing that at one point they suggest that when a child consistently confuses chord A for chord B, you should emphasize chord A and de-emphasize chord B, to give the child more time to learn chord A, so maybe $w_m$ should be a number less than 1?

That said, I am a bit worried that the most likely sources of confusion will be the most recent chords, and so the child might "lose" chord B if not shown it enough. Maybe there should be a different decay curve for the downweighting, or maybe it should be a "per-session" thing, where you have a whole session that de-emphasizes B and a whole session that de-emphasizes A.

Would be interested to know if there's any research to back up any of these ideas about how to resovle confusion (either in chords or in just remembering similar things).