Add a listen element for score following and machine listening apps

w3c / musicxml

MusicXML specification

515 stars 57 forks source link

Add a listen element for score following and machine listening apps #294

Closed mdgood closed 3 years ago

mdgood commented 5 years ago

Score following and other machine listening applications could use a new <listen> element for information specific to listening to a performance of a MusicXML score. This could parallel the existing <sound> or <play> elements for information specific to playback.

Examples of these applications include Metronaut, SmartMusic, and Match My Sound. Assessment applications like SmartMusic and Match My Sound could use a way to specify which note these should be listening for when an instrumentalist or singer is practicing a part with a temporary divisi split. Performance applications like Metronaut can use a way to specify a note or chord that they need to wait for. This could be the first note after a fermata or another key part in the score.

In addition to specific types of events that we can specify directly in MusicXML, we can also include an <other-listen> child element. Similar to the <other-play> element, this would handle other types of events beyond those already built in.

SmartMusic is currently using MusicXML processing instructions to notate this information. Since it is useful across several different applications, it seems better to represent this as a standard MusicXML element that could be read by any listening application.

jsawruk commented 5 years ago

Could you provide an example of proposed markup for this element?

What would happen if the event being listened for was wrong, i.e. the audio or timing information didn't match the expected event. For example, a user of assessment software plays the note too early?

I assume the primary use case is for monophonic input, but would it also be possible to use this element in the following cases:

Polyphonic input. Either polyphonic assessment or general score following
Timecode only. For example, move a playback cursor when a certain time occurs, or following score playback with other multimedia elements such as audio or video clips.

I think this is a really good idea and could open up a lot of new applications of MusicXML, or making existing applications easier to work with.

mdgood commented 5 years ago

Thanks, @jsawruk. Here is an example of some possible markup. You generally would not have all these items in a single <listen> element but I wanted a short example.

For at least the assessment part, I think we will need an additional MusicXML element to identify the players, musicians, or sub-parts within a part. This does not necessarily correspond to the MusicXML <instrument> or <score-instrument> element. It is more for a divisi split within a single instrument. We could put those definitions within the <score-part> element. Here I have called the element <player> but there are many possible choices for an element name.

      <score-part id="P1">
        <part-name>Women</part-name>
        <score-instrument id="P1-I1">
          <instrument-sound>voice.female</instrument-sound>
        </score-instrument>
        <player id="P1-M1">
          <player-name>High</player-name>
        </player>
        <player id="P1-M2">
          <player-name>Middle</player-name>
        </player>
        <player id="P1-M3">
          <player-name>Low</player-name>
        </player>
      </score-part>

      <note>
        <pitch>
          <step>A</step>
          <octave>4</octave>
        </pitch>
        <duration>6</duration>
        <voice>1</voice>
        <type>quarter</type>
        <listen>
          <assess player="P1-M1" type="all">no</assess>
          <assess player="P1-M2" type="all">yes</assess>
          <assess player="P1-M3" type="all">no</assess>
          <wait>note</wait>
          <other-listen type="new-type">new value</other-listen>
        </listen>
      </note>

I think we would want some defaults for these elements so the usual case (e.g., assessment on) does not need to be written out explicitly each time.

This is all preliminary, but I hope it gives a better idea of how this might work.

bhamblok commented 5 years ago

Although @mdgood is clarifying that one would not need all these items in a single element, I think this is a lot of data, which is useless for the majority of apps which are not focussing on score-following. I think there is definitely a need for standardising this kind of information, but I'm not sure it should be done within the musicXML-spec. Wouldn't this be a good use-case to create (or discuss again about) an alternate "performance"-file-format, which can be referenced from within the musicXML (or better from a MNX)-file? Just like css is references from an html-file? This has been discussed before, but I do not find a closing answer in our issues-list. Shouldn't we keep a separation of concerns? Music-semantics versus layout versus performance? As a comparison I'm referencing (again) to HTML, css and js. This is a slightly different discussion than the CWMNX vs GMNX discussion in the beginning of MNX.

mdgood commented 5 years ago

Thank you for the added feedback @bhamblok!

One clarification is that the <listen> element will generally not need to be written except in more unusual situations. I do not think the addition of these <player> and <listen> elements will bog down applications that do not use them. I do agree that if applications were writing <listen> elements for most notes, this solution would not be desirable.

Score following applications are already using MusicXML and are probably not interested in a separate performance-only format. The choices would seem to be either the status quo, where each application requires its own app-specific data, or something standardized. With a standardized approach, a music notation editing application can write one set of data that many different listening apps can use.

MNX's separation of concerns is trying to make a clearer distinction between different types of data than MusicXML does, in order to make application development easier. It is not trying to remove MusicXML functionality from MNX-Common. The proposal here follows the way that MusicXML handles this type of separation, using dedicated elements as well as attributes.

arshiacont commented 5 years ago

Score Following applications would obviously need the score. There are cases where the musical context provided by MusicXML is not sufficient enough to address reactions. One scenario is the "end of a phrase" or "beginning of a section/rehearsal mark" where the system should not react continuously.

This is similar to "playback" elements in purpose. It makes much more sense to include them in MusicXML than keeping them separate. At Antescofo, we have been using Miscellaneous elements embedded in MusicXML to address such use-cases.

The case of Assessment is ofcourse very different!

I guess one way to address this is to list current use-cases by each actor.

We will try to contribute to this Issue swiftly.

arshiacont commented 4 years ago

Hello Team! Just a quick follow up on this proposal: In case of Metronaut App (score following + automatic accompaniment), our team can put forward the following items that can be of general interest. They correspond to expected reaction as a result of listening and high-level musical input that are not necessarily found in standard MusicXML that can become subtypes of a listen element:

wait : Indicating that the Artificial Listening system should wait on this element. Example of this can be a <note> element after a Fermata, or marking the beginning of a section/rehearsal mark where you don't want the Playback to continue without sensing the right cue from the musician. In such situations, using the musician's tempo is thus not useful. It can accept a value of yes/no to start with.
syncOption : Syncing an accompaniment to live musician can become a matter of style! For example: on a given piece of music there might be sections where you do not want the accompaniment section to sync to the performer (you want the reverse). This is where the artificial listening is following the musician but not necessarily syncing. This element applies thus on a section with a type that accepts start/stop. As for its value, we can start by limiting a yes/no approach until further common use-cases are found (we have several).

In general: a new Listen element in general MusicXML can open perspectives for indicating Human Performance indications that goes beyond mere playback or similar that we see today in the market and address important issues for Live Interaction scenari such as Score Following and Accompaniment.

There at Antescofo, we are ready to support actions towards the proposal and contribute to use-cases that go beyond our own Metronaut App.

bhamblok commented 4 years ago

Hello all,

I would like to repeat myself... I'm that kind of developer who is always thinking about (CPU and network) performance. As I mentioned here above, I still think a lot of this information should be standardised, but not necessary inside the format(s) of musicXML (and MNX).

In the previous comment, I believe there is an example of (which I think) the "wait" element is totally redundant. Why not collecting this information from the context of the siblings out of the musicXML/MNX-file?

If there is a fermata in the context of this note, you should wait...
If there is a section/rehearsal/... in the context of this note, you should wait...

I really think that if we bloat the musicXML (and/or MNX) file formats, with musical-performance-data, it will have a negative impact on the (CPU/network)-performance of lots of applications which might not need this musical-performance data.

bhamblok commented 4 years ago

In a follow up to my previous post, I propose to add a <link> element, having a "rel"-attribute (relation) which links to an external document. (cfr. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link) This external document should containing all this musical performance data by referencing to "id"-selectors or (why not) using XPath, or just css-selectors...

arshiacont commented 4 years ago

@bhamblok We are all caring developers! ;)

That's actually what we do when the context is clear (such as Fermata or section/rehearsal). But we can imagine a bunch of scenari where the author/editor wants that kind of interaction cues and is willing to annotate that beyond a mere Text Box. For some Editors, this can also become an option next to existing elements (similar to Playback elements).

I am perplexed by your CPU concerns! I believe many of us who use MusicXML can find a lot of redundant information based on our use-cases. We just ignore them! This is off-topic here but I believe if anyone has CPU issues with XML to the extent that a new element would tear it off, then one should reconsider the software architecture. MusicXML Standard is not there to solve/address this.

Separation of Concern is however an issue here and I hope MNX will address that properly.

bhamblok commented 4 years ago

@arshiacont I can only agree with you that authors/editors would want to make this kind of interaction cues manually. But if you want to encode values having a "yes" or "no" (boolean) value, than an xml-element is totally verbose and definitely not the right choice. Can we agree on creating attributes to store this kind of information?

PS: I'm always thinking of CPU performance during development (eg. when I have to write nested loops, which I try not to...). But in this case I'm more concerned about network performance... and yes, parsing data (especially verbose data) also consumes CPU-power where from the UX point of view: "every millisecond counts" :-)

mdgood commented 4 years ago

@bhamblok I don't think there is going to be a lot of musical performance data. This is data that tends to be needed to help with less common situations, including places where human performers could often use some extra rehearsal time.

Plenty of applications discard the detailed presentation data that MusicXML files can contain, and that tends to be far larger than the added data that a listening application might need. MusicXML was designed for both selective encoding (you only need to encode the data that your app cares about and has accessible) and selective reading (you can ignore the things you don't care about). This has served well in its primary use cases for storage and exchange between applications.

Performance concerns are much more important for MNX's use cases. That is one reason why MNX-Common is more compact than MusicXML. MusicXML was never intended to be a terse format. Having a complete representation of music notation data has been more important.

I think it will be best if we let MusicXML and MNX-Common each have its own internally consistent design.

arshiacont commented 4 years ago

@bhamblok the wait feature can become an attribute since its scope is limited to a note/chord but if we reason like that all all playback elements should become Attributes as well! :)

The synchMode (for lack of a better word) can not be an attribute since its scope is on sections. This one is actually very common in classical music and other styles (think of a voice that swings where others are rhythmically strict etc.).

Re: Network Performance... well, I bet most people do not use a fair % of data that is embedded in MusicXML. These performance related elements won't take that much space but if I had to worry about them, I would suggest two solutions: (1) Don't use them! ;) or (2) Parse them out on the server end before sending to users. I have never come to issues like this on Music Sheet data but I can imagine that it can occur.

mdgood commented 3 years ago

Here are my current design ideas for this issue.

The <listen> element will be available at 3 levels: attached to a <note>, attached to a <direction>, or attached to a <measure> (or <part> in a timewise file). When used at the direction or measure level, it can contain an offset element to position it more precisely if needed.

There will be four child elements to the <listen> element: <assess>, <sync>, <wait>, and <other-listen>. The <assess> element would work as described above, except the “type” attribute would be replaced with a “time-only” attribute. The <other-listen> element is an extension element that allows us to add things we did not think of in this version.

The <sync> draws on the Antescofo synchronization strategies which describes the common use cases that @arshiacont mentioned in his initial <syncOption> proposal. This could have a format of:

<sync type=“x” latency=“y”/>

Here the type would be an enumeration with the values none, tempo, event, mostly-tempo, mostly-event, and always-event. The idea is that these correspond to various Antescofo attributes:

none = sync (sort of - here it's the absence of synchronization rather than another type of control)
tempo = loose
event = tight progressive
mostly-tempo = target progressive
mostly-event = target conservative
always-event = tight conservative

The latency attribute would be in milliseconds, as in Antescofo.

This type attribute incorporates the element value from Arshia's earlier proposal. We could go back to that strategy if people think that would work better. To me this approach seems a little easier by avoiding having to match start-stop pairs.

I think that the <wait> element could be empty. Its presence indicates that you are waiting for a note or conductor beat. I am not sure what a yes/no value would be adding here. Please let me know if I am missing something.

For the <assess> element, we will be defining <player> elements within a <score-part> that can be referenced by an optional player attribute. Do people think it would be helpful to add an optional player attribute to the other <listen> child elements? I think it would be helpful for the <other-listen> element, but I am not sure if it makes sense for the <sync> and <wait> elements.

Please let me know your thoughts and suggestions. I can create a pull request once we have an agreed-upon design.

mdgood commented 3 years ago

Pull request #376 addresses this issue. Some of the differences from the previous discussion are:

There are two distinct elements: <listen> for information specific to notes, and <listening> for information that changes the state of an application from this point onward in the score.
The <assess> and <wait> elements are children of the <listen> element, and the <sync> element is a child of the <listening> element. There are <other-listen> and <other-listening> elements to extend beyond what is supplied in version 4.0.
Each of these child elements can have a player and time-only attribute.

@arshiacont, do you think this pull request captures the ideas you discussed above? Ore are there things that you believe need to be added or changed? @jsawruk, is this matching what you expected? @bhamblok, does this appear to be setup efficiently? There aren't any boolean element values in the pull request.

Everyone's feedback is welcome! Please share your thoughts. This is a new feature direction for MusicXML so we want to have a solid foundation for future extensions in this area. I am expecting to need to make changes before merging.

mdgood commented 3 years ago

Initial implementation has gone well, so I will be merging the pull request and closing the issue to keep things moving. We can always open a new issue later if we need to.