Open xavriley opened 2 years ago
Started in this direction by moving to transition regions https://github.com/xavriley/crepe_notes/blob/b1fe2120799a490c3d9dd5d93eb36970af9fb7b7/crepe_notes/crepe_notes.py#L152-L154
Seems to improve results on the previous method, but still more work needed to handle slides and scoops intentionally.
During the initial segmentation, a segment with a wide variance is likely to be a slide e.g. at the very start of a note. Currently we take the median of this sement resulting in a short note with essentially a random pitch.
Anecdotally this doesn't sound too bad but it does harm the accuracy metrics for precision, recall and f-measure.
Other methods treat these slides as note transitions (e.g. https://www.mdpi.com/2076-3417/12/15/7391) which makes sense in the vocal context, but I'm not sure that it helps if the target output is MIDI. Either a note is on or off.
We could also model the pitch contour more accurately by using MIDI pitch bends. This is the approach taken by Basic Pitch. Working with pitch bends in ground truth annotations is cumbersome though. It also becomes more difficult when the recordings are not tuned to the A440Hz standard e.g. at a quarter tone out are you bending to a standard midi note from above? Or from below? This needs more thought.