marl / jams

A JSON Annotated Music Specification for Reproducible MIR Research
ISC License
186 stars 27 forks source link

Split pitch_ namespaces into note_ and contour_ #111

Closed justinsalamon closed 8 years ago

justinsalamon commented 8 years ago

As discussed in #91 starting here, currently the pitch_hz and pitch_midi namespaces are overloaded in a bad way, supporting both note-style annotations (where a note has a start time, duration, and pitch) and continuous f0 (contour)-style annotations (where contours are specified by timestamped pitch values with 0 duration).

Apart from being confusing, this causes a number of problems e.g. for sonification and visualization.

The proposed solution is to split "pitch" into "note" and "contour", keeping the hz/midi distinction as well:

  1. note_hz
  2. note_midi
  3. contour_hz
  4. contour_midi Note that the only difference between hz and midi here is how pitch values are represented (on a Hertz scale or as MIDI note numbers), but the two are otherwise equivalent: neither implies any sort of pitch quantization, and both can be used to represent sparse (note) annotations and dense (contour) annotations.

Finally, we need to decide whether to use the aforementioned namespace names as is or add a pitch_ prefix to each of the 4 (the former means shorter/cleaner namespace names, the latter is less ambiguous).

bmcfee commented 8 years ago

I think i'd err on the side of shorter namespaces here; there isn't a whole lot of ambiguity IMO.

[Tagging @rabitt here.]

Should we try to distinguish between monophonic and polyphonic annotations? This isn't a big deal for note_* which are, for all intents and purposes, piano roll encodings. It does seem to be a problem for contour_* though, when it comes to interpreting the annotations (eg through viz or sonify).

Do we need a separate set of namespaces for multi-f0? Or just use a collection of annotations to code multi-f0 annotations? The latter seems a little clumsy to me, but I don't work with this kind of data so it's hard to tell.

A new namespace for multi seems sensible, but I don't know how to structure it. Maybe each observation is a tuple of (contour_id, frequency_value)? Also seems clumsy.

Packing multiple contours into a single contour_hz annotation seems like a total nightmare though. Am I wrong?

justinsalamon commented 8 years ago

I think i'd err on the side of shorter namespaces here; there isn't a whole lot of ambiguity IMO.

Ok, so lets go with:

  1. note_hz
  2. note_midi
  3. contour_hz
  4. contour_midi (the contour ones might change depending on how we decide to handle multi-f0)

Do we need a separate set of namespaces for multi-f0? Or just use a collection of annotations to code multi-f0 annotations? The latter seems a little clumsy to me, but I don't work with this kind of data so it's hard to tell.

I would definitely avoid using a collection of annotations - aside from being clumsy it also breaks the unity of an annotation object and doesn't match the semantics of the JAMS schema. I think a separate namespace for multi-f0 with a structure to encode multiple overlapping contours is the way to go. @rabitt might have some idea about what that structure should actually look like?

justinsalamon commented 8 years ago
  1. note_hz
  2. note_midi
  3. contour_mono_hz
  4. contour_mono_midi
  5. contour_multi_hz
  6. contour_multi_midi

?

bmcfee commented 8 years ago

Depending on what contour_multi* ends up looking like, the mono version might be redundant/unnecessary, since it's a special case with n_contours=1.

justinsalamon commented 8 years ago

Depending on what contour_multi* ends up looking like, the mono version might be redundant/unnecessary, since it's a special case with n_contours=1.

True. There are use cases that would only apply when n_contours=1 (e.g. evaluating the annotation for melody extraction using a jams wrapper), so we just have to make sure that these use cases are handled gracefully.

bmcfee commented 8 years ago

And whille we're at it, can we kill the "sign indicates voicing" convention? It's a silly abuse of notation, and makes it far too easy to introduce bugs.

I would much rather provide converter utilities that pack the data into this format as needed, eg, when calling into mir_eval.

justinsalamon commented 8 years ago

I'd be open to this, as long as we facilitates import/export from/to this format. We'd still need to indicate voicing somehow though - the confidence values (still overloading)? a separate array?

bmcfee commented 8 years ago

as long as we facilitates import/export from/to this format.

cleaning up the world's garbage formats is our business :grin:

the confidence values (still overloading)? a separate array?

I think the sanest encoding is a point-wise triple of (contour_id, frequency, voiced) = (int, float, bool).

For multi-f0, we'll need a separate field to denote the contour id anyway, so as long as we're doing two, why not three?

If we want to then specialize to a monophonic case, we can back down to (frequency, voiced). However, I suspect there's good reason to want a contour id in the monophonic case as well, since it will make it easier to group multiple disjoint contours from the same track. @rabitt ?

bmcfee commented 8 years ago

ping @rabitt

rabitt commented 8 years ago

I think the sanest encoding is a point-wise triple of (contour_id, frequency, voiced) = (int, float, bool).

πŸ‘

If we want to then specialize to a monophonic case, we can back down to (frequency, voiced). However, I suspect there's good reason to want a contour id in the monophonic case as well, since it will make it easier to group multiple disjoint contours from the same track. @rabitt ?

Yes, I think there are definitely use cases for having contour number in the monophonic case.

justinsalamon commented 8 years ago

Yes, I think there are definitely use cases for having contour number in the monophonic case.

+1

bmcfee commented 8 years ago

So, it sounds like we'll just have the four namespaces then:

Sound good?

justinsalamon commented 8 years ago

πŸ‘ We just need to be clear in the docs about the concept of contours and that algs that don't group by contours can simply represent the whole melody time-series as a single contour.

justinsalamon commented 8 years ago

p.s - this is not urgent, but we might want to prepare for the scenario where contour information includes more "features" in addition to pitch. For instance the Melodia algorithm computes instantaneous salience in addition to frequency, and a whole bunch of contour-level features too, and it would be cool to be able to store all of these in JAMS.

bmcfee commented 8 years ago

p.s - this is not urgent, but we might want to prepare for the scenario where contour information includes more "features" in addition to pitch. For instance the Melodia algorithm computes instantaneous salience in addition to frequency, and a whole bunch of contour-level features too, and it would be cool to be able to store all of these in JAMS.

For something like that, I'd suggest making a custom namespace definition that ships with melodia and/or data sets. This is why we added dynamic namespace imports.

justinsalamon commented 8 years ago

For something like that, I'd suggest making a custom namespace definition that ships with melodia and/or data sets. This is why we added dynamic namespace imports.

πŸ‘

bmcfee commented 8 years ago

Resolved by merging #121

bmcfee commented 8 years ago

@justinsalamon in retrospect, how was the pitch_midi schema supposed to encoding voicing? Negative midi numbers are still valid frequencies, so propagating the sign to the voicing bit doesn't work.

This is currently biting me in writing tests for the pitch_midi -> pitch_contour auto converter. Do we just not support unvoiced frequencies for midi namespaces?

justinsalamon commented 8 years ago

If I recall correctly the assumption was that pitch_hz would be used for time series and pitch_midi would be used for notes, where voicing is indicating implicitly by the absence of a note. But in practice either namespace could be used to represent both a frequency time series or a set of notes.

Long story short, yeah, I guess we didn't support unvoiced frequencies for midi namespaces. Won't be a problem with the new namespaces, for conversion I guess lets just say unvoiced is not supported for midi.

bmcfee commented 8 years ago

If I recall correctly the assumption was that pitch_hz would be used for time series and pitch_midi would be used for notes,

Right -- I'm talking about the old (deprecated) namespaces where that was implicit, not the new note_* namespaces.

I just wanted to make sure that we're agreed that implicit voicing does not exist in midi space. I think we're :+1: though. Thanks for clarification!