Closed justinsalamon closed 8 years ago
I think i'd err on the side of shorter namespaces here; there isn't a whole lot of ambiguity IMO.
[Tagging @rabitt here.]
Should we try to distinguish between monophonic and polyphonic annotations? This isn't a big deal for note_*
which are, for all intents and purposes, piano roll encodings. It does seem to be a problem for contour_*
though, when it comes to interpreting the annotations (eg through viz or sonify).
Do we need a separate set of namespaces for multi-f0? Or just use a collection of annotations to code multi-f0 annotations? The latter seems a little clumsy to me, but I don't work with this kind of data so it's hard to tell.
A new namespace for multi seems sensible, but I don't know how to structure it. Maybe each observation is a tuple of (contour_id, frequency_value)
? Also seems clumsy.
Packing multiple contours into a single contour_hz
annotation seems like a total nightmare though. Am I wrong?
I think i'd err on the side of shorter namespaces here; there isn't a whole lot of ambiguity IMO.
Ok, so lets go with:
Do we need a separate set of namespaces for multi-f0? Or just use a collection of annotations to code multi-f0 annotations? The latter seems a little clumsy to me, but I don't work with this kind of data so it's hard to tell.
I would definitely avoid using a collection of annotations - aside from being clumsy it also breaks the unity of an annotation object and doesn't match the semantics of the JAMS schema. I think a separate namespace for multi-f0 with a structure to encode multiple overlapping contours is the way to go. @rabitt might have some idea about what that structure should actually look like?
?
Depending on what contour_multi*
ends up looking like, the mono version might be redundant/unnecessary, since it's a special case with n_contours=1.
Depending on what contour_multi* ends up looking like, the mono version might be redundant/unnecessary, since it's a special case with n_contours=1.
True. There are use cases that would only apply when n_contours=1 (e.g. evaluating the annotation for melody extraction using a jams wrapper), so we just have to make sure that these use cases are handled gracefully.
And whille we're at it, can we kill the "sign indicates voicing" convention? It's a silly abuse of notation, and makes it far too easy to introduce bugs.
I would much rather provide converter utilities that pack the data into this format as needed, eg, when calling into mir_eval.
I'd be open to this, as long as we facilitates import/export from/to this format. We'd still need to indicate voicing somehow though - the confidence values (still overloading)? a separate array?
as long as we facilitates import/export from/to this format.
cleaning up the world's garbage formats is our business :grin:
the confidence values (still overloading)? a separate array?
I think the sanest encoding is a point-wise triple of (contour_id, frequency, voiced) = (int, float, bool)
.
For multi-f0, we'll need a separate field to denote the contour id anyway, so as long as we're doing two, why not three?
If we want to then specialize to a monophonic case, we can back down to (frequency, voiced)
. However, I suspect there's good reason to want a contour id in the monophonic case as well, since it will make it easier to group multiple disjoint contours from the same track. @rabitt ?
ping @rabitt
I think the sanest encoding is a point-wise triple of (contour_id, frequency, voiced) = (int, float, bool).
π
If we want to then specialize to a monophonic case, we can back down to (frequency, voiced). However, I suspect there's good reason to want a contour id in the monophonic case as well, since it will make it easier to group multiple disjoint contours from the same track. @rabitt ?
Yes, I think there are definitely use cases for having contour number in the monophonic case.
Yes, I think there are definitely use cases for having contour number in the monophonic case.
+1
So, it sounds like we'll just have the four namespaces then:
note_hz
, note_midi
value = float
dense = False
contour_hz
, contour_midi
value = (int, float, bool)
(contour id, frequency, voicing)dense = True
Sound good?
π We just need to be clear in the docs about the concept of contours and that algs that don't group by contours can simply represent the whole melody time-series as a single contour.
p.s - this is not urgent, but we might want to prepare for the scenario where contour information includes more "features" in addition to pitch. For instance the Melodia algorithm computes instantaneous salience in addition to frequency, and a whole bunch of contour-level features too, and it would be cool to be able to store all of these in JAMS.
p.s - this is not urgent, but we might want to prepare for the scenario where contour information includes more "features" in addition to pitch. For instance the Melodia algorithm computes instantaneous salience in addition to frequency, and a whole bunch of contour-level features too, and it would be cool to be able to store all of these in JAMS.
For something like that, I'd suggest making a custom namespace definition that ships with melodia and/or data sets. This is why we added dynamic namespace imports.
For something like that, I'd suggest making a custom namespace definition that ships with melodia and/or data sets. This is why we added dynamic namespace imports.
π
Resolved by merging #121
@justinsalamon in retrospect, how was the pitch_midi
schema supposed to encoding voicing? Negative midi numbers are still valid frequencies, so propagating the sign to the voicing bit doesn't work.
This is currently biting me in writing tests for the pitch_midi -> pitch_contour
auto converter. Do we just not support unvoiced frequencies for midi namespaces?
If I recall correctly the assumption was that pitch_hz
would be used for time series and pitch_midi
would be used for notes, where voicing is indicating implicitly by the absence of a note. But in practice either namespace could be used to represent both a frequency time series or a set of notes.
Long story short, yeah, I guess we didn't support unvoiced frequencies for midi namespaces. Won't be a problem with the new namespaces, for conversion I guess lets just say unvoiced is not supported for midi.
If I recall correctly the assumption was that pitch_hz would be used for time series and pitch_midi would be used for notes,
Right -- I'm talking about the old (deprecated) namespaces where that was implicit, not the new note_*
namespaces.
I just wanted to make sure that we're agreed that implicit voicing does not exist in midi space. I think we're :+1: though. Thanks for clarification!
As discussed in #91 starting here, currently the
pitch_hz
andpitch_midi
namespaces are overloaded in a bad way, supporting both note-style annotations (where a note has a start time, duration, and pitch) and continuous f0 (contour)-style annotations (where contours are specified by timestamped pitch values with 0 duration).Apart from being confusing, this causes a number of problems e.g. for sonification and visualization.
The proposed solution is to split "pitch" into "note" and "contour", keeping the hz/midi distinction as well:
note_hz
note_midi
contour_hz
contour_midi
Note that the only difference between hz and midi here is how pitch values are represented (on a Hertz scale or as MIDI note numbers), but the two are otherwise equivalent: neither implies any sort of pitch quantization, and both can be used to represent sparse (note) annotations and dense (contour) annotations.Finally, we need to decide whether to use the aforementioned namespace names as is or add a
pitch_
prefix to each of the 4 (the former means shorter/cleaner namespace names, the latter is less ambiguous).