w3c / mnx

Music Notation CG next-generation music markup proposal.
168 stars 19 forks source link

Combined dynamics. fp, pp, fff, etc. #168

Open shoogle opened 4 years ago

shoogle commented 4 years ago

Basic dynamics

MusicXML represents basic dynamic types like this:

Combining basic dynamics

The MusicXML spec allows you to combine dynamics to create more complex types, like this:

Pre-combined dynamics

However, MusicXML also has special elements specifically for some combined types:

This means there is more than one way to represent the same thing in MusicXML.

I believe MNX should drop pre-combined dynamic types in favour of constructing them from the basic types. This means forcing people to use <f/><p/> instead of <fp/>, for example.

Reference: https://usermanuals.musicxml.com/MusicXML/Content/CT-MusicXML-dynamics.htm

hhpmusic commented 4 years ago

separated fields like

is not a good idea, at least from the software's internal side, which treats the dynamic combination as a whole. Then the exporter has to separate every letter and words, and braille transcription software has to combine them again. Musicxml's dynamics and other-dynamics works fine.

shoogle commented 4 years ago

@hhpmusic, you need to enclose XML tags like <p/> inside backticks (ASCII code 96: Grave Accent) for them to make it through GitHub's Markdown / HTML parser.

the software's internal side [...] treats the dynamic combination as a whole

This is not necessarily the case for all software. A program could quite easily consider each dynamic letter to be a separate character, and in fact this is pretty much exactly what MuseScore does.

the exporter has to separate every letter and words, and braille transcription software has to combine them again

This is trivially easy to do.

Musicxml's dynamics and other-dynamics works fine.

MusicXML's already allows for dynamic combinations, so software must already be capable of doing the above combining and decombining in order to be compliant with the MusicXML specification.

adrianholovaty commented 4 years ago

I believe MNX should drop pre-combined dynamic types in favour of constructing them from the basic types. This means forcing people to use <f/><p/> instead of <fp/>, for example.

@shoogle: Can you give some thoughts on why we shouldn't force it the other way around? In other words, what if we forced <fp/> and disallowed combining dynamics? Seems to me this would be simpler...?

shoogle commented 4 years ago

@adrianholovaty, sure.

If you disallow all combined dynamics then of course that makes life very simple, but that leaves you with a finite list of dynamics that can never be expanded later.

MusicXML allows you to combine dynamics "to create marks not covered by a single element, such as sfmp". Note that this is completely separate to the other-dynamics feature used for arbitrary text.

If MusicXML allows you to combine basic types to make something more complicated, why not start with the most basic set of all (p, f, m, s, z, r, n) and force all others to be constructed from these?

MusicXML allows combinations, but how many applications actually bother to support this? How many people are even aware of the possibility?

If MNX forces something as basic as pp to be made from a combination then everyone will support combinations.

shoogle commented 4 years ago

It's true that SMUFL has special combined symbols for things like pp, but that doesn't mean applications have to use them. pp is two separate characters in MuseScore and you cannot tell the difference. It also manages to export correctly to MusicXML, so conversion is not a problem.

davemacdo commented 4 years ago

Being able to create combined dynamics is (I think) critical to the flexibility that we need to encode everything that might be needed to represent the score. I'm thinking of things like più f, Ligeti's pppppp to fffff, and even Grainger's louden lots (though maybe this is no longer a dynamic in the same sense as those others). My point is, flexibility—within certain structural guidelines—is crucial here.

shoogle commented 4 years ago

Current spec

The draft spec mentions a dynamics element with a type attribute, which would look like this:

<dynamics type="p"/>

The spec then says:

The following dynamics are supported: p, pp, ppp, pppp, ppppp, pppppp f, ff, fff, ffff, fffff, ffffff, mp, mf, sf, sfp, sfpp, fp, rf, rfz, sfz, sffz, fz, n, pf, sfzp

This corresponds to all of the "pre-composed" dynamics in SMUFL, but excludes the colon, hyphen and space characters that are sometimes used as separators in combined dynamics, like p-f.

Proposal

I would prefer the spec to say something like this:

Syntax

The type attribute contains a sequence composed of the following ASCII characters:

p, f, m, s, z, r, n, : (colon), - (hypen-minus), (space)

Characters may appear more than once in the sequence and in any order.

Semantics

In a visual rendering of the music notation, applications must reproduce the sequence exactly as given, replacing each character with the corresponding SMUFL glyph. Applications may choose to make use of combined glyphs where available, but are not required to do so.

In an audio rendering, applications may choose to implement performance characteristics for sequences that have an agreed musical definition.

shoogle commented 4 years ago

If we want to mandate specific performance characteristics for audio rendering purposes then it could be done as follows:

Basic sequence

The following sequences of dynamic characters have their usual meanings in terms of musical performance:

mp, p, pp, ppp, pppp... any number of p (regex: mp, p+) mf, f, ff, fff, ffff,... any number of f (regex: mf, f+) sf, fz, sfz, sffz,... any number of f (regex: sf, fz, sf+z) rf, rfz, rffz,... any number of f (regex: rf+, rf+z)

The presence of additional f or p characters indicates a more extreme version of the basic dynamic.

Combinations

Basic sequences may be combined using colon (:), hyphen-minus (-) and space ( ) characters as separators. In terms of audio, the dynamic at the beginning of the sequence takes effect immediately, while dynamics appearing later in the sequence are slightly delayed.

An example of this is the piano-pianissimo sequence p-pp (p-pp), which is assumed to have the sound of an ordinary piano (p) dynamic followed shortly afterwards by an ordinary pianissimo (pp). Applications may interpret the duration specified by "shortly afterwards" as they see fit (e.g. during or immediately after the note on which the dynamic occurs).

Separator characters may be omitted from combined dynamics where doing so does not create ambiguity. This is possible for forte-piano (fp) but not for the piano-pianissimo example given previously, as that would make it indistinguishable from pianissimo-piano (pp-p) and pianississimo (ppp), the latter being the correct interpretation if no separator characters are present.

Other sequences

When presented with a sequence that cannot be decomposed into one of the basic types, such as rr or mmp, applications are free to define their own rules for audio rendering. Any sequence that the application does not have a specific interpretation for must be gracefully ignored (i.e. the audio rendering must remain the same as if the unknown dynamic were not present).

mdgood commented 4 years ago

I like the direction of this latest proposal from @shoogle. It tightens up the definitions of standard dynamics, both in terms of semantics and concision. It keeps the combined dynamics as a single unit semantically, while leaving flexibility for compound dynamics.

Now how could we go about combining text into what is semantically a dynamic? Things like subito p and molto ff can be more problematic to transfer in MusicXML than would be ideal. I'd like any MNX revisions in dynamics to make it easier to treat these as a single semantic entity.

We may also need a bit more flexibility for audio. While p-pp might mean piano followed shortly by pianissimo, in a repeated section it could mean piano the first time and pianissimo the second time.

shoogle commented 4 years ago

@mdgood, thanks!

Now how could we go about combining text into what is semantically a dynamic?

One way would be to drop the type attribute and do it like this instead:

<dynamics><p/></dynamics>
<dynamics>subito <p/></dynamics>

This uses mini tags like <f/> and <p/> to represent the letter-like glyphs, but it wouldn't work for the separators (I doubt that <:/>, <-/> or < /> are valid XML).

Separators could be encoded as:

Alternatively, the symbols could be kept plain and tags used for ordinary text:

<dynamics>p</dynamics>
<dynamics><text>subito </text>p</dynamics>
<dynamics>p-pp</dynamics>

I personally like this last approach best as it keeps the common cases nice and simple.

shoogle commented 4 years ago

We may also need a bit more flexibility for audio.

It is also possible to provide performance characteristics for custom dynamics. The default should be to ignore the custom text and just parse the symbols (so subito p would sound like ordinary p), but it should be possible to override this behaviour.

My preferred way to do this is to have a global way to define dynamics at the beginning of the score:

dynamic-dictionary:
    dynamic:
        sequence: <text>subito </text>p
        sounds-like: p
    dynamic:
        sequence: <text>più </text>f
        level: +1

(I used YAML here for the sake of readability, but it would work equally well in XML.)

This snippet has defined the following:

Once defined, a dynamic can be used anywhere in the score without further clarification. It could even be used as part of a sequence, like this:

<dynamics><text>più </text>f-<text>subito </text>p</dynamics>

Means "più f" closely followed by "subito p".

The above syntax could be extended to provide abilities like transition-to-next or return to previous to enable custom versions of cresc./ dim. and sf to be created.

shoogle commented 4 years ago

While p-pp might mean piano followed shortly by pianissimo, in a repeated section it could mean piano the first time and pianissimo the second time.

Gould's recommended notation (p. 236 of Behind Bars) is to write an explicit "1st/2nd time only" next to anything that is different for a repeat. She gives an example of:

ff (2nd time: mp cresc.)

Though I appreciate that various musical editions notate things differently.

If necessary, sequences could be broken into multiple elements:

<dynamics>
    <sequence>p</sequence>
    <separator>-</separator><!--optional-->
    <sequence>pp</sequence>
</dynamics>

This enables attributes to be added to each <sequence> element to control which repeat it is played on, or to the <separator> element to control the amount of delay it introduces. You wouldn't want to do this for everything though, fp being the obvious counter example.

Repeat differences are arguably a separate issue that needs a more general solution:

<repeat-variation plays="1">
    <dynamics>p</dynamics>
</repeat-variation>
<repeat-variation plays="2">
    <dynamics>pp</dynamics>
</repeat-variation>

This kind of syntax would work for more than just dynamics, and it means we can carry on using p-pp for its usual meaning of piano shortly followed by pianissimo. (It also avoids splitting fp.)

shoogle commented 4 years ago

It seems that the latest co-chairs meeting touched on this topic. The minutes show these items were discussed:

Both ideas were rejected (and rightly so, in my opinion).

Neither of those ideas featured in my proposal above. Am I to take it that the idea of composing dynamics from basic types (the topic of this issue) is still on the table as a potential option going forward?

adrianholovaty commented 4 years ago

@shoogle We definitely still need to find a solution for combined dynamics — yes, the idea is still on the table!

joeberkovitz commented 2 years ago

@adrianholovaty asked me to try to progress this issue to an actionable state. It seems quite close to that goal already. Here is a proposal that I believe is in line with @shoogle's suggestions so far. I'll do this in a bullet-point fashion first, then we can see about progressing it to a more formal definition after review.

Allow free combination of dynamic letters: remove the restriction in the spec that restricts a dynamic's type attribute to one of the pre-cooked SMuFL combinations. Instead, allow anything satisfying the regex [pfmszr]+. These are understood to be rendered by an appropriate SMuFL glyph if available, otherwise by combining the atomic glyphs for the letters. Note that <dynamic> already allows a custom glyph to be specified.

No delimiters: For simplicity in 1.0, let's not permit colons, hyphens, and other decorations that aren't part of conventional dynamic letter-combos.

Only define performance interpretation of basic combinations (p, fp, mf...): There is so much creativity in the literature, and interpretation can vary from one score or genre to another. This applies to unusual letter combinations, and even more so to repeat differences. Let's allow these things to be notated freely, but not try to spec out all the possible semantics in 1.0. At some point, performance styling (using styles for "sounds like" as well as "looks like") can be a useful tool to encode how a specific dynamic or text+dynamic group is to be performed as well as the repeats problem. I think this is best done as a general mechanism in MNX and not something specific to dynamics, since it applies to pitch, rhythm, tempo, articulation... so also not for 1.0 probably.

Reintroduce the <dirgroup> element to combine text and dynamics: The original draft spec proposed a <dirgroup> element whose children were directions grouped into a sub-sequence. This was already envisioned as combining text and dynamics, but was left out of the 1.0 migration. As an example, here's a subito p at the beginning of a bar

    <dirgroup location="0">
      <expression>subito</expression>
      <dynamics type="p"/>
    </dirgroup>

The <dirgroup> element is not necessary (one can always just put subito and p next to each other), but it is helpful. It means that its child directions are to be visually arranged in reading order, and have a compound meaning which in this case is a dynamic. Note that location is only provided for the group, not for its children, and the children can't be spans. If this approach makes sense then we'd add a separate issue to spec out <dirgroup>.

I anticipate the objection that <dirgroup> is not necessarily a dynamic, or necessarily any particular thing. A host application wouldn't know that subito is part of a dynamic until it encounters the p, but there is a continuum in the literature between expressions and dynamics, with pure expression text sometimes having clear dynamic interpretations (e.g. sempre following some dynamic instructions, or descriptive words like doux). I think allowing freer encoding is the better choice here, and it still allows applications to make sense of straightforward combos.

(Legacy note: we will never get rid of older documents in which things like subito and p are separate directions that are visually juxtaposed. Applications will still need to be able to handle that.)

Do not combine hairpins and dynamics: Just echoing the former decision on this, because I think it's a real can of worms to encode it, and because it's simply not needed. Applications can form their own sequences fairly straightforwardly by looking at the location and end attributes of dynamics, text and hairpins. (Note that the visual positions can be tweaked away from the metrical positions by styles prescribing X and Y offsets; the metrical positions would be used for performance of course.)