marl / jams

A JSON Annotated Music Specification for Reproducible MIR Research
ISC License
183 stars 26 forks source link

Duration-agnostic measurements #14

Closed bmcfee closed 9 years ago

bmcfee commented 9 years ago

From #13 @justinsalamon ...

Is there a thread where this is being discussed?

Okay team. What do we want to do about measurements that are intended to span the entire track? Is that a null-duration event, do we explicitly include the track duration?

Arguments for null-duration markup:

Arguments for explicit durations:

Arguments for zero-duration:

A third option might combine the first two: use null-duration for "weak supervision" (ie, the tag applies somewhere in the track but I don't know where), and full-duration for "strong supervision" (ie, this entire track is hip-hop).

bmcfee commented 9 years ago

Hitting @justinsalamon 's points in order:

"time/duration" agnostic (in this case tempo, other examples could be genre, key, etc.).

I don't think we should require any of these to be time/duration-agnostic. For some datasets, it's explicitly full-track measurement and strong annotation (eg, gtzan genres cover the entire 30s clip), but in others, the annotation is weak (eg, a "rock" tag in cal500 doesn't necessarily imply the entire track is rock).

This actually touches on another point back in #13 : interpretation of confidence values. For tags, I could see some benefit in using the confidence field to encode strong/weak annotation. I think this is generally distinct form the timing question, but both points should be discussed.

Wouldn't it be safer to have some type of NaN duration indicator so that the distinction between things that span time (e.g. a chord label) and things that don't (e.g. a genre label) is explicit?

Genre doesn't have duration? Not a Mr Bungle fan I take it...

justinsalamon commented 9 years ago

:+1: @bmcfee I'd vote for null-duration, for the arguments you've already listed, and also since I think it makes more "semantic sense", i.e. if a track is trimmed would that change it's genre? no (in most cases). Thus it seems sensible that the two elements (label and non-null duration) shouldn't be linked. If you want a complete annotation you can just include the track duration in the file metadata.

Same applies for e.g. key and tempo. That's not to say there aren't tracks where these properties do change during the piece (Mr Bungle and friends), but then they can be annotated like chords as something that has an explicit duration.

Just my 2¢

bmcfee commented 9 years ago

If you want a complete annotation you can just include the track duration in the file metadata.

Yeah, I guess I can buy that. @ejhumphrey care to chime in?

Same applies for e.g. key and tempo

FWIW, the isophonics key files are interval-timed.

ejhumphrey commented 9 years ago

I like explicit. I like validation. I like the idea that if a track gets trimmed (from the end or the start), whoever did it is responsible for changing the timing data.

Does trimming a track change the semantics or significance of an observation? I'd argue it generally should, if not in concept then at least in confidence; if it doesn't, that's more likely a result of an ambiguous or imprecise labeling than every infinitesimal instant in a waveform actually corresponding to some genre.

The reason I would be a stickler on this is the same reason I think "onsets" or "beats" should be annotated as intervals rather than truly instantaneous events. The timespan over which the observation is valid should be encoded explicitly, in the same way one might mark a chord or verse or key. Genre and tags have, for the most part, gotten away with weak global labeling because they could, and I would like to discourage this behavior when possible.

Don't get me wrong, there's value to weakly labeled data, but I'd rather it was not default behavior.

On Thu, Jan 29, 2015 at 3:22 PM, Brian McFee notifications@github.com wrote:

If you want a complete annotation you can just include the track duration in the file metadata.

Yeah, I guess I can buy that. @ejhumphrey https://github.com/ejhumphrey care to chime in?

Same applies for e.g. key and tempo

FWIW, the isophonics key files are interval-timed.

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/14#issuecomment-72097167.

bmcfee commented 9 years ago

The timespan over which the observation is valid should be encoded explicitly, in the same way one might mark a chord or verse or key.

I totally agree in principle. But I think there's a real difference in what an interval/duration means for quasi-instantaneous events (onsets) compared to sustained phenomena (chords). In the former, I'd expect the duration to refer to a an interval of uncertainty -- the handful of milliseconds it takes for an onset to occur doesn't really translate to any perceptually meaningful duration. As such, I'd think it should be quantified in the "confidence" field, rather than overloading the meaning of duration.

OTOH, if the annotation is coming out of a piece of software, as all ultimately do, duration might be a good place to quantify the resolution of the measurement (hop_length / sr). But that would be a pretty redundant encoding for something I expect to be constant for a particular annotation.

Genre and tags have, for the most part, gotten away with weak global labeling because they could, and I would like to discourage this behavior when possible.

:+1: :heavy_multiplication_x: :100:

Don't get me wrong, there's value to weakly labeled data, but I'd rather it was not default behavior.

I don't see this as necessarily being a question of "default". It's a translation issue, and will probably vary from one dataset to the next.

urinieto commented 9 years ago

As @ejhumphrey, I also support the explicit value option. I would rather have duplicated but easy to validate data, than some encoding (i.e., null) that might further confuse the jams newcomers.

In terms of the instantaneous data like onsets or beats, I don't think the duration makes sense at all. And in this case, I would simply add a 0 duration value (unless we decide to simply remove the duration field). This is both easy to validate and to read.

bmcfee commented 9 years ago

If we use explicit timing (ie, full-track duration), I think we can condense this all down into a single principle of interpretation:

This rule is simple, and easy to apply consistently across all tasks.

With optional duration, things are more complex:

I don't particularly like the idea of special-casing something that can be explicitly encoded with a simplre rule.