jpauwels / MusOOEvaluator

Evaluator for music label sequences based on the MusOO library
BSD 3-Clause "New" or "Revised" License
13 stars 4 forks source link

Chord Evaluation Bug #1

Closed fdlm closed 6 years ago

fdlm commented 7 years ago

Chord evaluation returns a MirexMajMin score of 0 for this reference file, if compared to itself using

MusOOEvaluator --reffile ref.txt --testfile ref.txt --chords MirexMajMin --output out.txt

The chord in the reference file is A:min7(*5,b6). I would expect the chord given in the reference to be ignored, because it cannot be mapped clearly to a major or minor triad.

This affects real data: if I compare the reference annotations of "The Beatles - Drive My Car", to themselves, MusOOEvaluator only returns a MajMin score of 92.27%.

jpauwels commented 7 years ago

Hi Filip. That looks like a proper bug indeed. I'll look into it and keep you posted.

jpauwels commented 7 years ago

Okay, I see where the bug is coming from (for my reference as much as yours). There's a difference between parsing chords from the reference file and chords from the test file in the sense that pitch class spelling is used for the reference parsing (assuming it it hand-made) and is not used for test parsing (assuming it is generated by a non-spelling aware program).

I disagree that the chord cannot be clearly mapped to a major or minor triad. In jazz it is quite common to drop the 5th if the 3rd and 7th are present, to make room for more tensions (in this case the b6, which I would write as b13 because there already is a 7 present). I would therefore write that chord (in non-Harte syntax) as Amin7add(b13) or Amin7add(b13)drop(5) if I want to be really pedantic. The reduction to a triad is then simply Amin. This is also what the parser does.

The problem now arises when the pitch class spelling is not used. Then the b6 can alternatively be interpreted as a #5, which would make the full chord Amin7(#5) and its triad reduction the less common Amin(#5). This is exactly what the parser does, such that Amin(#5) gets compared to Amin giving a score of 0.

The easy fix is to change the parsing tree such that the Amin7add(b13) interpretation of the unspelled version gets precedence over the Amin7(#5) interpretation (which should be done in any case, because the latter is arguably more common). The bigger question is how to deal with this difference in reduction between spelled and unspelled chords in a more general way. Should the reference also be considered unspelled (which would avoid other potentially undiscovered bugs) and what drawbacks does this have (e.g. loss of info for confusion matrix)? What with the planned functionality of evaluating spelling of the algorithmic output as well (for a future when algorithms produce spelled chords)?

fdlm commented 7 years ago

Is there a reason not to treat annotations and predictions the same way? From my point of view, a syntax definition clearly says how chords should be represented, i.e. imho parsing files written in a specific syntax should work the same regardless of what or who created the file.

We can't really say (can we?) from the chord definition if the fifth was dropped to make space for the b6. We do not know neither the intention of the annotator nor of the composer. Even if we could guess it with some certainty for this specific chord, I don't see how to do this in general. I've seen my share of weird (to me) annotations out there, so I'm not sure if annotators really can be trusted in this nuances.

jpauwels commented 7 years ago

Well, the syntax is parsed unambiguously. It's just that there are two modes in the triad reduction algorithm, one that takes into account spelling and one that doesn't. You want to take spelling into account for the reference file in order to determine which segments to evaluate. For instance, when you want to evaluate only on min chords, you want to take those min that are annotated as such, not those that could be theoretically interpreted as min by an automated parser that ignores spelling. As crap as human annotators can be, I'd trust them more than a simple algorithm.

But in essence you're right of course, the actual comparison should be made between chords that are parsed the same way. Therefore my fix is to use spelling only when determining which segments to evaluate and ignore spelling when comparing the resulting chords pairs. I've created a fix#1 branch and now your example file gets the correct score. Please try it and let me know if this solution creates any other issues.

fdlm commented 7 years ago

Sorry, but I can't see the fix#1 branch. Did you push it?

jpauwels commented 7 years ago

Sorry, pushed now.

fdlm commented 7 years ago

It seems to work, but it took 100 seconds (!) to evaluate the file I attached in the issue.

This is what I did: Pulled fix#1, updated libMusOO, tinyxml2 and eigen to the bleeding-edge versions, and re-build MusOOEvaluator.

Then I run the evaluation on the file I attached in the issue:

./MusOOEvaluator --reffile test.lab --testfile test.lab --out /dev/stdout --chords MirexMajMin
jpauwels commented 6 years ago

The performance regression has been fixed too. It was actually caused by an update to the underlying chord label parser, not because of the changes in the actual evaluator code. An updated version of fix#1 is now merged into master. Thanks for reporting!