Realizations and Layouts

w3c / mnx

Music Notation CG next-generation music markup proposal.

174 stars 19 forks source link

Realizations and Layouts #138

Closed joeberkovitz closed 4 years ago

joeberkovitz commented 6 years ago

This issue is a proposed direction that's intended to address (partly or wholly) the concerns of multiple existing issues including #4, #34, #57, #121.

At the moment I have not proposed any particular XML schema for these ideas, because I wanted to get the ideas themselves out on the table first. I don't think the problem of making up elements, attributes and relationships is going to be too difficult here, though.

Transposition And Its Discontents

For scores that can be consumed as both full scores and instrumental parts, the question of how to best encode pitch is confounding. Perhaps the two biggest forces driving MNX architecture are the desire to accurately encode the author's intentions, and the desire to accurately encode the material to be read by readers. When it comes to transposition, however, these forces are not necessarily in alignment. Perhaps the question of "which encoding is best?" is itself part of the problem, rather than part of the solution.

On the authorial side, a manuscript (whether autograph or digital) may have been originally notated in either concert or transposed pitch. Thus a decision to encode only concert pitch, or only transposed pitch, can impose an unacceptable distance between the encoding and the original material. Recall that MNX should serve to encode materials that may not have ever passed through a notation editor, with a reasonable degree of directness. Such original materials could differ as greatly as an orchestral piece notated in full at concert pitch, and a clarinet solo notated as a single transposed part. Should it not be possible to encode either one, such that there is a direct correspondence between the original manuscript and its encoded pitches?

If so, it seems hard to escape the conclusion that no single pitch encoding convention serves the goals of MNX well. Some of the many scenarios with pitch that may occur, include the following:

Original full concert score, derived transposed parts
Original full transposed score with identically transposed parts
Original full transposed score with derived full concert-pitch score
Single-instrument transposed part, with no need for a derived full score

It's also true that any algorithmic rule for conversion between pitch levels will sometimes need to be overridden by skilled editorial judgment. This doesn't mean that algorithms play no role, but it does mean that an override mechanism is necessary.

Finally, there is no single choice for pitch encoding that eliminates the need to convert between different pitch schemes. Implementors will have to deal with this complexity in at least one direction, and the conversion is basically symmetric in nature: it is not more complicated to go from A to B, then from B to A.

While it has been argued that concert pitch is a "canonical truth" that transcends transposed music, the only canonical truth we really have is whatever the composer wrote down -- which could be in either pitch scheme.

Score And Part Realizations

Looking beyond transposition, we find that parts and scores can differ in other semantic ways. Some directions or elements (e.g. cue sequences) are only visible in one or the other. Multi-measure rests may be present in a part and absent in the score, or vice versa. A textual direction might be shown in a teacher's guide and omitted from the student edition.

So it seems useful to situate the problem of score/part transposition within a larger landscape of allowing a CWMN document to vary for different roles. We therefore propose the new MNX-Common concept of a realization, which says how the document's contents are to be transformed for consumption by a particular role (e.g. conductor, performer, student, teacher, etc.). There are at least two major types of realization: a full-score realization, and part- specific realizations (one for each part).

Let's start by trying to roughly define a realization, and then look at how this definition works:

A realization has a list of included parts.
In a given realization, each part transposes its pitches a specified interval from concert pitch.
In a given realization, any measure may override the default key signature with a transposed enharmonic.
In a given realization, any note may override the default spelling with a transposed enharmonic.
Directions and sequences may be restricted to only occur in designated realizations.

There are two built-in kinds of realization, reflecting the main needs of producers and consumers: score (including all parts), and part (one for each part in the score).

Note that realizations don't concern style properties or system breaks or system spacing or credit placement or... all that other visual stuff... That realm is where layouts come in (see below). For example, a single part realization might have multiple layouts for different page or display sizes, each of which has different system breaks and credits positioning.

How Do Realizations Affect Encoding?

Each part specifies a source realization. The source realization of a part determines how that part's pitches are encoded. Because different realizations can transpose the same part differently, this means that pitches can be encoded either in concert or transposed pitch.

Let's look at several widely differing scenarios to illustrate:

In a document derived from a concert-full-score original (or exported from an application which uses concert pitch as a reference point), we'll have this scenario:

The score realization will specify concert pitch for each part (possibly with octave transposition for bass, piccolo, etc.)
Each part realization will specify the transposition for its specific part, along with enharmonic overrides.
Each part's source realization will be score, thus all notes will be encoded in concert pitch.

In a solo instrument score with a single transposed part as the original (or exported from an application which uses transposed pitch as a reference point), we'll have this scenario:

The sole part realization specifies transposed pitch for that single part.
The score realization (if it even exists) is identical to the part realization.
The single part's source realization is its part realization, thus all notes will be encoded in transposed pitch.
Consequently there do not need to be any enharmonic overrides.

In a document derived from a set of transposed parts we'll have this scenario:

The score realization will specify concert pitch for each part. (A full-transposed-score realization could exist also!)
Each part realization will specify the transposition for its specific part
Each part's source realization will be part and will be encoded in transposed pitch.
Each part will include enharmonic overrides for the score realization, as needed to support a presentation at concert pitch.

Transposing With Intervals And Semitone Distances

Like MusicXML, MNX must specify transpositions as diatonic intervals, i.e. as a combination of steps and semitones. However, as mentioned above realizations may also supply explicit key signatures and note spellings to override any prevailing transposition interval.

How Do Realizations Affect Presentation?

When rendering a part for consumption by a reader, a target realization is used. The problem of figuring out how to spell notes in a given realization is therefore as follows: how do we transform a note from its source to its target realization? The rough answer, to be refined in the actual spec, runs something like this:

If the two realizations are the same, do nothing
If the transpositions for source and target are the same, do nothing
If a key signature override exists in the target, use it. Otherwise transpose the source key signature by the intervallic difference between the target transposition and the source transposition.
If a note spelling override exists in the target, use it. Otherwise transpose the source note according to the distance in 5ths between the source and target key signatures.

Layouts

In comparison to realizations, layouts are fairly simple. They are ways in which a score may be presented. This is not just about full scores as opposed to parts. For example, a full score realization could itself be customized to be shown on an iPad, printed on A3 paper, or shown in an infinite scrolling strip. Each of these would constitute a distinct layout, based on the score realization of a document.

A layout is characterized by the following:

An underlying realization of the document (typically full-score or part).
Credit/header/footer text with spatial placement relative to display margins.
A stylesheet (i.e. score-wide class and selector definitions). This is useful to control the global appearance of the score (e.g. staff line spacing).
For this specific layout, layout-specific style property overrides can be applied to any element of the score. This capability allows measure style properties for system/page breaks to be scoped to a particular layout, among other things.
An optional display size range, used to automatically select the correct layout based on device characteristics. This would act similarly to CSS Media Queries.

mdgood commented 6 years ago

Thanks for this proposal Joe! I have one clarification question and one comment:

1) Am I understanding correctly that whatever the source realization, what is encoded is the written pitch? That could be transposed for part and transposed score realizations, and concert (with or without octave transpositions) for a concert score realization.

2) The section on Transposing With Intervals And Semitone Distances badly misrepresents MusicXML's transposition. MusicXML of course uses a combination of chromatic and diatonic steps, not just semitones which wouldn't work. Since MNX has no difference here it seems this section could be cut, but at least it needs to represent MusicXML capabilities accurately.

joeberkovitz commented 6 years ago

@mdgood for [1], that's correct: what is encoded is the written pitch for the source realization, which could use any pitch level (typically either written for the instrument or for concert pitch viewing).

For [2], my apologies I was repeating what I mistakenly thought you had said in https://github.com/w3c/mnx/issues/4#issuecomment-400119329. Looking in the schema, I see of course you're right that MusicXML transposition has multiple components. I've corrected the writeup above to fix this mistake.

cecilios commented 6 years ago

@joeberkovitz Great proposal! Thank you. Let's hope that the evil don't come from the details.

shoogle commented 6 years ago

@joeberkovitz, I came to the same conclusion regarding transpositions: while concert pitch makes sense for most new compositions, the "definitive" representation of historical compositions is whatever transposition is used in the available source material.

If MNX forced all scores to be stored in concert pitch it would be problematic for accurately storing historic compositions, and for OMR. However, if MNX forced scores to be in transposed pitch then it would consign itself to only ever being used as an archival and interchange format; I can't see any major score editor choosing to use a transposed representation for their native format. This means both concert pitch and transposed pitch are required.

Might I suggest the following internal representation:

Each staff contains:
- note pitches as written in the source score, or the composers chosen "definitive" representation
- the transposition of the staff
- given as a number of diatonic and chromatic steps away from concert pitch
Each instrument contains:
- the transposition of that instrument
- given as a number of diatonic and chromatic steps away from concert pitch
- any hints necessary for deciding between key signatures used for display

To calculate the sounding pitch, take the stored pitches and "undo" the staff transposition to get concert pitch.

To calculate the pitch to be displayed to the user in a sheet music application, take the stored pitches, "undo" the staff transposition, then apply the instrument transposition (or just apply the difference between the staff and instrument transpositions).

Benefits:

If the notes were written at (for example) a B-flat transposition, and the chosen instrument is also B-flat instrument, then no transposition occurs and notes are simply displayed as written.
If notes were originally written (e.g.) for a B-flat instrument, but the user decides to swap this for an instrument with a different transposition (or no transposition), then all that happens is the new instrument's transposition is recorded; the stored pitches are not changed. Since the stored pitches are not changed, they are still in the "definitive" representation, and are guaranteed to be displayed as originally intended if the user subsequently switches back to a B-flat instrument.

mogenslundholm commented 6 years ago

I did a bigger transposition example showing 17 normal pitches with one-to-one relation. This can be done knowing that that transposition is between C and Bb. However, the MusicXML approach with both chromatic and diatonic steps seems to be better. It can handle not only standard keys.

Adding an offset when playing is no big problem. I have identified four cases in MusicXML concerning pitch: Clef, Transpose, Octava and Harmonic. But I would not be shure there couldn't be another definition affecting the pitch. (Transpose is written pitch, Harmonic may be both written and sounding pitch)

What worries me is, that this is not the end. There are exceptions. How many will appear? We are considering a few instruments used, but there are many instruments in the world. For example alt-flute(recorder), which I played in F-Clef as a child. Now it appears transposed C-Clef in MuseScore. The flute even has barock grip or german grip. An oud has several tunings - sometimes shown transposed.

The only we know for sure is sounding pitch. The same into eternity defined in terms of number of cycles of radiation a cesium-133 atom and the base A4-tone. But notation changes. Some tunes may not be played right in 100 years.

The alter-command is absolutely sounding pitch. Does the alter value make sence when transposed?

With a one-to-one tranposition, there should be no problem. (both the chromatic and diatonic specified). Note the last note in the example. You could say: a Bbb - isn't it just an A? But no, since the diatonic tranposition value is one, it becomes Bbb.

Using sounding pitch in the file does not change any presentation of a transposed tune, it is like a computer using binary numbers internally, but may present the number as decimal number.

With one way to solve some problem, it will be tested all the time. With two ways it will cost more than double effort and double test. The two possible ways to do it will never be tested equally.

A scale may not be equal tempered. Having two books and several papers about makam and folkmusic. Opens the book Makam of Signell and read: "transposition will cause a sligt alteration in the size of an interval". But I think that the MusicXML-way will also work here - even for the quarter tones.

PS: Note that transposition is two things: A melody may be transposed and actually change the pitch. And a melody may be shown transposed in order to make it easier for the player to read the notes.

transposing

joeberkovitz commented 6 years ago

@mogenslundholm I believe you're confusing two different things: transposition (which only affects the way parts are presented for reading by a performer) and sounding pitch (which determines the actual pitch heard in performance).

This proposal only deals with transposition. As such, it will suffice to describe how to transform from one realization to a different one (e.g. show a Bb instead of a C, or a D## instead of a G#). This is not about sounding pitch. Furthermore it is a single procedure based on a diatonic/chromatic interval, not two (or N) different procedures.

Actual sounding pitch, temperament, and microtones such as those in maqam music are not part of this proposal. At the moment I believe that <interpret> is sufficient to describe the actual performed pitch of any note in decimal terms. But please let's not take that question up here -- it is not about transposition, really.

mogenslundholm commented 6 years ago

Thanks for the clarification. I still believe that the makam-music can work "just" by doing it like MusicXML.

joeberkovitz commented 6 years ago

Notes from chair call today:

Realizations can also tweak style properties (e.g. stem direction).
For credits, permit text to reference or embed metadata as data source.
For layouts, we'll eventually need a separate detailed proposal for credits, headers, footers, page-attached text, interpolated text

PR will come next.

jsawruk commented 5 years ago

Regarding pitch representation:

I am just getting caught up on this issue, so apologies if this duplicates someone else's suggestion.

It is my opinion that every note should encode both the written and sounding pitch, with the written pitch being mandatory and the sounding pitch being optional.

For example:

<!-- Note that sounds as written -->
<note pitch="D5" />

<!-- Note that sounds at a different pitch than written -->
<note pitch="C5" sounding-pitch="Bb4" />

If sounding-pitch is omitted, then sounding-pitch is equal to (written) pitch.

Instruments could continue to have transposition properties so that a fallback is available when sounding-pitch is omitted. For example, an MNX document only encodes a C score and has no sounding-pitch properties, but the user wishes to view this document as a transposing score. In that case, the instrument's transposition information would inform a transposition algorithm. This display could be inaccurate in terms of enharmonics, but would work similar to how transposing instruments currently work in MusicXML.

Here is my rationale for including both written and sounding pitch:

It avoids ambiguity regarding transposing instruments, and allows the author to make editorial judgements regarding appropriate enharmonics. In other words, instead of considering sounding pitch as a transformation of written pitch, I would consider sounding pitch to be independent of written pitch.
It allows for accurate realizations without intervening transposition algorithms in some scenarios. For example, to properly display a graphic rendering, use the written pitch (but see above regarding C scores). Creating a MIDI realization of the same content would use the sounding pitch. No transposition algorithm is involved in this case.
Each part could also have its own independent pitch and sounding pitch, so that the part's pitch representation is independent of the score's pitch representation of the same note?

I should note that I have not yet fully considered the implications of ottava lines or transposing clefs. I would defer to however MNX decides to encode these objects, while retaining the semantics that sounding pitch is the literal sounding pitch (i.e. MIDI pitch number or other pitch representation that encodes to hertz).

I realize that using an approach like this could be overly verbose, but I would prefer verbosity to ambiguity.

shoogle commented 5 years ago

It is generally a bad idea for a format to allow multiple ways to say the same thing. Written pitch and sounding pitch are not the same, but they are related by the transposition interval so it shouldn't be necessary to store both where one and an offset would suffice. What happens if we store both and they become out of sync?

I think we really need to see some real-world examples of where a naive transposition would get it wrong, and an explanation of why this can't be solved simply by improving the algorithm, perhaps by taking the current or previous key signature/transposition into account.

clnoel commented 5 years ago

Considering that Joe, the initial proposer of this framework, is not really around to comment any more, I want to know if some one else (a chair or someone who agrees with it) is now championing the original proposal?

I'm of two minds on this. On the one hand, I strongly consider the "ground truth" of a piece of sheet music to be the sounds it is intended to produce. Music is a language and notation is its written form. That's one of the reasons I consider things like multi-measure rests and even whether you use a quarter or two tied eighths to be decisions of writing style that change how easy it is for a performer to create sound from the notation, and therefore many such things belong as part of layout decisions, since they do not alter the ground truth of the resulting sound. From that point of view, it makes sense for sounding notes to be encoded in the scores.

On the other hand, especially for historic documents, the ground truth of the intended sounds can be hard to figure out, since the "writing style" of the music has changed over time, and is sometimes peculiar to individual composers, much like differences in handwriting. Also, it is part of the mission here to be able to faithfully reproduce the original source document without a lot of hassles. From this point of view, it makes sense for written notes to be encoded in the scores.

What we've got to do here is to decide: Which viewpoint is more important?

Or, if they are of equal importance: Do we support both viewpoints evenly, even though it gives us two ways to represent things in MNX?

--Christina (Edited to remove a typo)

jsawruk commented 5 years ago

I think we should support a way that separates the concept of written pitch and sounding pitch because this is the least ambiguous. After considering the problem more thoroughly, I propose the following:

The written pitch is the primary pitch. This is what is physically displayed in the sheet music.
Each written pitch may then have an associated collection of 0 or more alternate representations.

These alternate representations may or may not be the sounding pitch. For example, one alternate representation could be the written pitch for a specific part. The use case for this is condensed scores.

Consider a condensed score written in C with a single staff for "High Woodwinds". This staff might show the conductor the sounding pitch for both the Flute and Bb Clarinet parts, but does not show the written pitch as the performers see them.

In the above scenario, one should be able to create such a condensed score and unambiguously extract the written part.

Example for non-transposing instrument or a transposing instrument in a transposed score:

<pitch name="C4" />

Example for transposing instrument in a C score:

<pitch name="C4">
  <alternate-pitch type="sounding" name="Bb4" transposition="-2" />
</pitch>

Example for condensed score:

<pitch name="C4">
  <alternate-pitch type="part" name="Bb4" transposition="-2" part-ref="#clarinet1" />
</pitch>

shoogle commented 5 years ago

Great points here from @clnoel. I agree that sounding pitch is the "ground truth".

@jsawruk, your examples expose the fundamental weakness of encoding written pitch, which is that it can differ between the score and the parts. Your method requires altenate-pitch to sometimes be sounding and sometimes written, whereas if you made the sounding pitch the primary pitch then the alternate pitches would only ever be written pitches. Why treat the score differently to the parts?

mogenslundholm commented 5 years ago

Still, I think that an interesting aspect of music is also how it sounds. /Mogens

On 2019-02-18 18:48, jsawruk wrote:....

shoogle commented 5 years ago

Another argument in favour of sounding pitch is that this allows you to encode all editions in a single file. For example, you can encode:

Note sounding C4
- Smith 1874 Edition for instrument A used transposition X
- Jones 1874 Edition for instrument B used transposition Y

You can do the same for clefs and ottava lines:

Note sounding C5
- Smith 1874 Edition started an 8va line here
- Jones 1874 Edition inserted a treble clef here

Now applications can offer a dropdown list of editions and users can choose to have the score rendered according to that particular edition. This is not possible if you encode written pitch, at least not without encoding everything multiple times and creating a huge file.

adrianholovaty commented 5 years ago

Considering that Joe, the initial proposer of this framework, is not really around to comment any more, I want to know if some one else (a chair or someone who agrees with it) is now championing the original proposal?

Thanks for picking up this discussion! Speaking as the co-chair who's been asked to continue the MNX work: my gut preference is also to treat the sound as the ground truth.

With that said, I think we'd all benefit from a comprehensive list of pros and cons for the two approaches — so we can make a decision based on complete information. I will be synthesizing the comments here into such a list, so if anybody has further thoughts on the matter — especially perspectives that haven't already been expressed in this thread — please contribute your thoughts and reasoning.

I don't expect every detail of MNX will require such a heavy handed pros-and-cons approach, but this is such a fundamental decision that it's important to get right.

(I'm still getting situated in the co-chair role but hope to have some more concrete things to contribute in the coming days...)

jsawruk commented 5 years ago

@shoogle: One of the issues with using only sounding pitch is that the visual representation can change when displaying the written pitch.

For example, with a Bb instrument, a sounding note of Bb has a written pitch of C. Depending on key signature and/or engraver preferences, this could change the layout of the part since there could no longer be an accidental, potentially leaving an unnatural whitespace in front of the C. Engravers like to alter the parts independent of the score to optimize the display as a part. This is the part independence feature in score editors that allows you to alter properties of the part without altering properties of the score.

@adrianholovaty: I agree that creating a list of pros and cons for each approach is a good idea, and probably the only way to truly come to a consensus. I personally think there are three approaches:

Only written pitch is encoded, and a standardized algorithm is adopted to convert to sounding pitch
Only sounding pitch is encoded, and a standardized algorithm is adopted to convert to written pitch
Both written and sounding pitch are encoded and are completely orthogonal. No algorithm is needed to convert.

Every approach has its advantages and disadvantages, but I still think written pitch and sounding pitch are separate concepts (which is why I recommend an algorithm to convert between the two. For example, see how music21 handles accidentals during transposition). It is my opinion that written pitch is a very engraver/orchestrator way of thinking, while sounding pitch is a very composer/MIDI way of thinking. Since our group encompasses representatives from all of the above groups, I doubt we will be able to just pick one pitch representation that everyone can agree one.

Where should the pros and cons be listed?

adrianholovaty commented 5 years ago

@jsawruk Still working on finding a proper home for the pros and cons (perhaps the project wiki). At the moment, let's keep stuff in this thread. 👍

shoogle commented 5 years ago

@shoogle: One of the issues with using only sounding pitch is that the visual representation can change when displaying the written pitch.

For example, with a Bb instrument, a sounding note of Bb has a written pitch of C. Depending on key signature and/or engraver preferences, this could change the layout of the part since there could no longer be an accidental, potentially leaving an unnatural whitespace in front of the C.

One of the lessons from MusicXML is that you shouldn't try to store the exact positions of all elements in the first place as applications will just ignore it and do their own thing regardless. If you want exact positions then you need to use MNX-Generic (i.e. SVG).

It might be possible to specify positions as offsets from a default position, but the default position would surely take accidentals into account so the particular example you give would not be a problem. (If the default position didn't take accidentals into account then the layout would be ruined as soon as somebody tries to transpose the score to a different key.)

I suppose for OMR purposes it might be useful to have an option to store exact positions of symbols that were transcribed from a scanned score. This information could be used to overlay symbols on top of the scanned image within an OMR application, and for the purposes of training an AI to recognise musical symbols on a page, but it would not be used by sheet music applications for the purposes of storing layout information as they would much rather use their existing (and competing) algorithms.

Engravers like to alter the parts independent of the score to optimize the display as a part. This is the part independence feature in score editors that allows you to alter properties of the part without altering properties of the score.

This is really a separate argument relating to the wider question of how to encode differences between the score and the parts. i.e. Is it necessary to write everything out twice or can we get away with writing it once and storing the differences between them? Sounding pitch vs. written pitch is just a small part of that topic.

shoogle commented 5 years ago

As I said above, I think we really need to see some real-world notation examples where a naive transposition algorithm would get the conversion between sounding pitch and written pitch wrong. Sounding pitch seems vastly superior to written pitch in most other respects, so the case for written pitch really depends on these examples, and on an inability to solve them by any other method, such as:

Storing some kind of hint for the transposition algorithm.
Improving the algorithm (e.g. take the previous key into account at a key changes).

There is of course always the option to store both pitches, but the risk of doing this is that people may start using it as a way to encode differences that are not related to transposition. For example, people might try to encode things like "the score says E flat here but the parts say E natural". We may want to provide a dedicated feature for that very purpose, but we would need to think very carefully before allowing the transposition feature to be used for anything other than transposition.

jsawruk commented 5 years ago

@shoogle: "Sounding pitch seems vastly superior to written pitch". I respectfully disagree with this sentiment. Written pitch and transposing instruments exist for a variety of reasons, and are how music is notated both historically and currently. I do not think it is the responsibility of MNX to eliminate written pitch. If we do not support a direct encoding of written pitch, then MNX will not be used by publishers.

I would prefer an unambiguous representation of both sounding and written pitch, but I understand your concern that this could be abused. I view written pitch and sounding pitch as representing the same item but presented in different terms (similar to how a point can be represented by two different vectors using a change of basis). It is this line of reasoning that made me suggest a way to have multiple "representations" of a given pitch.

If we were to support only one representation, I would prefer written pitch, because it is easier to transform written pitch to sounding pitch. A sounding pitch could represent multiple written pitches (MIDI 60 = C4, but could also be B#3 or Dbb4), whereas a written pitch only represents a single sounding pitch (C4 always maps to MIDI 60). This is my concern with an algorithmic approach. Any transposition algorithm that we propose should have the property that it is an involution: T(T(x)) = x. However, since written to sounding is a 1:1 function, but sounding to written is a 1:n function, I do not think in general these two functions can be composed to create an involution. Now, in the 90%+ of cases, I doubt there will be any issue, but I do worry that there will be ambiguity in some edge cases.

As far as an example, I have created an example (please see attached) of a transposition algorithm failure using Sibelius, though I think this failure would occur in other software as well. I have written a melodic line of C, Cb, Cbb for both Flute (non-transposing instrument) and Clarinet in A (sounds down a minor 3rd). To convert the sounding pitch (the Flute line) to the written pitch for the Clarinet in A, the transposition algorithm applies the following transposition: transpose up a minor 3rd.

C +m3 -> Eb Cb +m3 -> Ebb Cbb +m3 -> ???

Since there is no notation for "E triple flat", the algorithm breaks in an unusual way. A copyist would change this note to a Db, such that:

Cbb +m3 -> Db

transposition-example

Note that this changes the pitch class. Writing this in code would require an exception to handle such a case. While this is a simple melodic example, I am also concerned with more complex use cases involving harmony, as well as music that is atonal/open key.

clnoel commented 5 years ago

@jsawruk First off, written-to-sounding is not one-to-one. Transposing staffs, like the clarinet in A that you show there, mean that a note that looks like an Eb should "sound" as a C when played by a Clarinet, instead of as the Eb a Flute would play. This is definitely one-to-many! And, in fact, is many-to-many, because C## and D both produce the same MIDI note. (Which is, I am now realizing, one of the cons of using sound as primary because you then have to specify which enharmonic you want!)

I consider your other point to not be an issue. My naive transposition algorithm doesn't produce the same probem as yours. I'll use your example which is C, C#, C##; C, Cb, Cbb. This gets changed to MIDI 60, 61, 62; 60, 59, 58. On transpose (of +3 semitones) my engraving program needs to make these look like MIDI 63 64 65; 63, 62, 61. So, naively, I end with Eb, E, F; Eb, D, Db as the default transposed notes. This still does not hit your manual transposition, (which uses E# instead of F and Ebb instead of D), but it is very clean and does not have an issue with representing the desired pitches correctly.

On the other hand if there was a Clarinet in A line in the original that I was trying to encode, and it looked like your manual transposition, I would need to specify the fact that an "unusual" enharmonic was being used for the E and the D. I'm not sure what that would look like off hand.

--Christina

jsawruk commented 5 years ago

@clnoel: I respectfully disagree with your interpretation of what I meant by written to sounding being one-to-one, and I should have provided a clearer argument. My point is that, for a non-transposing instrument, any written pitch maps to one and only one MIDI pitch. However, the opposite is not true: MIDI 60 could be interpreted as B#3, C4, or Dbb4. Because there is already ambiguity here for a non-transposing instrument, I feel that extending to transposing instruments only introduces more complexity and ambiguity. I think we are on the same page now, so sorry for the confusion!

I personally do not feel that a transposition algorithm should convert pitches into MIDI numbers and then back into a pitch. Doing so would require a pitch spelling algorithm (e.g. ps13). This could result in an incorrect pitch spelling. I am only worried about the possibility of an incorrect pitch spelling.

Also to your point, what are the "desired pitches"? Are they what the composer wants, what the performer wants, what the publisher wants, what the musicologist wants, or what the synthesizer/sampler wants? This is what I believe is the root of the pitch representation issue, because these could all (potentially) be different answers. We may choose one representation that does not give the optimal solution for all involved parties, and I think discussions like this thread are very helpful to find the best solution.

As for my position in this discussion, here is my full disclosure:

It is the opinion of my employer that written pitch must be encoded in any sheet music standard such as MNX, as this represents the publisher's intent. Since they interact with a very large number of publishers, they feel that this is an important issue.
As a software developer who has written several MusicXML parsers, I have been dealing with the pitch representation issues in MusicXML for well over a decade. It is my personal opinion (not my employer's) that there should be no ambiguity in pitch, but my experience as a developer and composer has shown that there can be ambiguity when it comes to transposition.

I am not simply arguing my point to be a contrarian, but to encourage discussion about this issue. It is an issue that myself and others feel is extremely important, so I hope we can find an acceptable solution.

shoogle commented 5 years ago

@clnoel, I think @jsawruk meant written to sounding is many-to-one, because both C# and Db produce the same sound. However, that is only true in equal temperament, so it is important that we get the correct enharmonic in case an application wants to enable other tuning systems. Let's avoid talking in MIDI pitches for this reason.

@jsawruk, I'm afraid the particular example you showed does not help your case.

Since there is no notation for "E triple flat", the algorithm breaks in an unusual way.

No, this is entirely predictable. There is indeed no such thing as "E triple flat", so now the algorithm needs to look for something equivalent, and the available choices are C# and Db. C# changes the scale by 2 degrees (E to C), whereas Db is only a change of one degree (E to D), so Db is the only correct choice. MuseScore gets this right, and I'm sure the vast majority of notation program would also get this right.

So in this case there is no disadvantage to using sounding pitch, but what about written pitch?

Let's explore what would happen if we stored your example as written pitches. For the Flute we store C, Cb and Cbb as before, but instead of calculating pitches for the Clarinet we just directly store Eb, Ebb and Db. Now imagine that we want to "undo" the Clarinet transposition and display the score with all instruments in C:

Eb - m3 -> C Ebb - m3 -> Cb Db - m3 -> Bb

So now the Clarinet has a Bb where the Flute had a Cbb.

This is the problem with storing written pitch: it obscures the harmonic relationship between the different parts. If we use written pitch then the harmonic relationship is preserved.

jsawruk commented 5 years ago

@shoogle:

No, this is entirely predictable. There is indeed no such thing as "E triple flat", so now the algorithm needs to look for something equivalent, and the available choices are C# and Db. C# changes the scale by 2 degrees (E to C), whereas Db is only a change of one degree (E to D), so Db is the only correct choice.

This is exactly what this group would need to specify in such a transposition algorithm. If MuseScore gets this "right", that's good, but MNX needs to specify what "right" is. In this case, we could say that MNX would use the same algorithm as MuseScore, but this decision and a description of the algorithm would have to be added to the standard. The reason why we can't just say "use the same algorithm that MuseScore uses" is that MuseScore may change their algorithm or we may not be able to use their algorithm for whatever reason. I personally would like to use the algorithm in music21, but the same caveats apply. If we decide to use an algorithm, then we must decide which algorithm and explicitly describe it in the specifications so that there is no ambiguity, and everyone implements the process the same way.

This is the problem with storing written pitch: it obscures the harmonic relationship between the different parts. If we use written pitch then the harmonic relationship is preserved.

I think you contradicted yourself there. I believe that using written pitch is the only unambiguous way to preserve harmonic relationships. For example, I consider an augmented unison to be a distinct interval from an ascending minor second, even though both intervals only move up one semitone. I feel that this distinction could be obscured if we only used sounding pitch.

Also, your example of how my transposition example is not reversible supports my position that both written and sounding pitch should be encoded separately. By not relying on an algorithm, the pitch is always correct. Relying on an algorithm might cause issues like the one you demonstrated above.

shoogle commented 5 years ago

@jsawruk, it appears that Sibelius gets transposition right too. If you look closely at your image you will see a strange symbol attached to the E for the clarinet. This symbol is indeed supposed to represent a triple flat. There is probably a setting somewhere to disable this and force Sibelius to use the nearest enharmonic equivalent.

If we decide to use an algorithm, then we must decide which algorithm and explicitly describe it in the specifications so that there is no ambiguity, and everyone implements the process the same way.

Yes indeed, but we don't have to specify a single algorithm. We could specify a default algorithm and then provide the following as overridable style settings:

Use double sharps and double flats? True / False (default True)
Use triple sharps and triple flats? True / False (default False)
- Alternatively, combine the two previous options into a single option maxAccidentalLevels with default 2.

We could also provide a setting to control how far around the Circle of Fifths you are allowed to go before you need to switch from using sharps to using flats, and vice versa.

I think you contradicted yourself there. I believe that using written pitch is the only unambiguous way to preserve harmonic relationships.

No, because the full quote was "This is the problem with storing written pitch: it obscures the harmonic relationship between the different parts." Using sounding pitch is the only way to ensure that intervals between notes in different parts are preserved when transposing the entire score. This might, as you say, be at the cost of having a few pitches for a transposing instrument spelled differently in the parts to how they were spelled in the particular edition you were transcribing, but we still haven't been shown any actual examples of this so it remains a purely hypothetical problem.

If examples do exist then I suspect they can be solved by providing the style options I mentioned above. If they are not solvable that way then it is arguably because the author of the original edition made a mistake, so then the question becomes one of whether we want to forever limit the ability of MNX to reliably transpose new material just for the sake of accurately representing a few historic mistakes? Maybe we should allow written pitch to be specified in addition to sounding pitch, but bear in mind that MusicXML already uses written pitch, so people can always use that for encoding problem historic scores if they really need to.

People have been wondering if there is any need for MNX given that MusicXML already exists. Switching to sounding pitch provides a perfect justification for a new format as it fills a void that MusicXML does not cover.

jsawruk commented 5 years ago

@shoogle:

This symbol is indeed supposed to represent a triple flat.

Just because the symbol appears doesn't mean that it is supposed to represent a triple flat. I still think this is an error state, but you would have to ask the Sibelius developers.

Yes indeed, but we don't have to specify a single algorithm.

Technically, what are you proposing here is a single algorithm that is parametric rather than multiple algorithms. Having parameters might be a good idea, but it could also backfire because software A and software B could use different defaults. This isn't necessarily a bad thing, but it is something to keep in mind.

This might, as you say, be at the cost of having a few pitches for a transposing instrument spelled differently in the parts to how they were spelled in the particular edition

For me, this cost is far too great. Every pitch should be correct, and there should never be any ambiguity. If a pitch is considered incorrect by someone, then it is incorrect. We should never infer intent: whatever the author (composer, arranger, publisher, etc) wrote should remain. Some authors and publishers are very particular, and would want complete control over how the pitch is displayed. If what is displayed doesn't match what they specified, then they will be upset and will probably not adopt MNX.

It's like the difference between written text and spoken text: both are text and part of language, but they can be different. For example, consider the English word "color" (US) vs "colour" (UK etc.). They are different spellings but have the same meaning and approximately same pronunciation. If I am writing for a US audience, then the text should say "color", but for others I may want to display "colour". I could choose which to display based on the browser/OS localization settings. Alternatively, I could localize using a translation algorithm, but it is often better to manually translate. In my view, a robust system should be able to support both. If someone visits my website from Germany, but I don't have a German translation of my page, they can send it to a translation algorithm and get a rough approximation. An algorithm can be a good fallback, but I still think that allowing someone to specify a written pitch as they see fit (as opposed to an algorithm) will provide the best experience for both authors and performers.

the question becomes one of whether we want to forever limit the ability of MNX to reliably transpose new material just for the sake of accurately representing a few historic mistakes?

If the mistakes were made by the authors, then those mistakes should persist. It is not the responsibility of MNX to be a spell checker: it only exists to present information, not correct it.

People have been wondering if there is any need for MNX given that MusicXML already exists. Switching to sounding pitch provides a perfect justification for a new format as it fills a void that MusicXML does not cover.

There are several reasons why people would want to switch to MNX beyond pitch representation. However, I disagree that switching to sounding pitch would be a justification to change to MNX. If you want to use sounding pitch instead of written pitch, you can already do that today using MIDI or by exporting a C score to MusicXML.

mogenslundholm commented 5 years ago

I still think that transposing can be unambiguous under the precondition that the transposed key is defined as a key e.g. Bb and that an extra accidental can always be added - e.g. Bbbbb, because transposing may add another accidental.

But could we solve the problem by always defining the pitch as a pair? (sounding pitch,written pitch). I like the idea of "completely orthogonal". The pair must be mandatory, because "non-mandatory"="not there"!

E.g. A complicated pitch with both accidental and pitch-alter could look like:

<note pitch="C4-0.3,C4+half-flat"/> or and for a transposed instrument (with odd tuning): <note pitch="Bb4+0.1,C4"/> or even the simple and most common case: <note pitch="C4,C4"/>

("Real programmers" would save three characters in the pitch and rather write: <note both-sounding-and-written-pitch="C4"/> and add new definitions etc. but ... )

With the double definition it is easy and fast to process the data both for playing and writing by simply using the value you need.

Is the solution always to have the pair of (sounding, written)-pitch?

jsawruk commented 5 years ago

@mogenslundholm:

I still think that transposing can be unambiguous under the precondition that the transposed key is defined as a key

I think this is an interesting idea, but how would we support compositions without key signatures (open key)? Perhaps absence of a key indicates that the composition is in an open key?

But could we solve the problem by always defining the pitch as a pair?

I think we are on the same page, but have slightly different approaches. I would recommend having two separate attributes rather than put two values into a single attribute. This would have the following benefits:

Separation of concerns: Each attribute represents one and only one thing
Having separate attributes is easier to define in the schema, and it would be possible to specify one of the attributes as optional
It would make writing the parsers easier because they wouldn't need to parse out the multiple values from a single attribute.

I really like your examples of non-12-ET pitches. There could be certain accidentals outside of 12-ET that might not be properly transposed using an algorithm that operates on 12-ET. I am not very familiar with non-12-ET music, but I think that providing both the sounding and written pitches in such scenarios would be the most direct. Otherwise, we might have to support multiple transposition algorithms?

clnoel commented 5 years ago

I've been thinking about this overnight, and I went back to the top of this and reread Joe's original proposal for the concept of Realizations. I'm rethinking what should be done here, so this will contradict some of my earlier posts!

I think that given the many-to-many relationship between written and sounding pitch, that @jsawruk is right that we are in fact going to need to allow for the specification of both.

Here's my thoughts, that incorporate Joe's idea of "realizations".

A) If the original source document is written without transposition (or there is no difference), then that is encoded straight: Clef, key, note spelling as written.

Result: A program looking to reproduce the original realization can do so easily by using the pitch-spelling provided. A program looking to produce a transposed-part realization will have to apply its own transposition rules and its own pitch-spelling algorithm (difficult). A program looking to produce midi can take the pitch spellings and turn them into midi numbers (easy).

B) If the original source document is a transposed part, then that is encoded as a staff with a "transposed clef", and the pitches are put in as if the clef is not transposed. (Written pitch)

Result: A program looking to reproduce the original can do so easily by using the spelling provided. A program looking to produce a sounding-pitch representation will have to apply the transposition and then whatever its own pitch-spelling algorithm is (difficult). A program looking to produce midi can take the pitch spellings, turn them into midi numbers, and then apply the transposition as math (easy).

C) If the original source contains both a sounding-pitch and transposed part, then both need to be encoded, if only to make sure the original source can be faithfully reproduced. I can see this happening two ways, both with ups and downs. C1) We can encode the sounding-pitch part and the transposed part as if they were two separate parts, and pick which one is shown in the realizations we make. If we take this option, I also recommend we find a way to flag them as the same part. C2) We can encode each note with alternate spellings for sounding-pitch and transposed-pitch. This also involves specifying two alternate clefs for the same staff and specifying in the realizations which one we are using. Note: I favor option C2, because C1 involves decoupling a part from itself, making editing a note much more difficult. However, I can see potential for abuse in C2 as mentioned above, so if we choose that method, we'll have to be very careful in the specification we use to enact it.

Result: A program wishing to reproduce either of the originals can easily do so using the appropriate realization. A program wishing to produce midi can use the sounding-pitch spellings and turn them into midi notes (easy).

Also, I'm not sure we want to specify a "pitch spelling" or "transposition" algorithm that belongs to MNX. I think that the ability to transpose away from the original (whether transposed or sounding pitch) should be left up to the consumers, assuming they want to produce some version of the score that is different from the original source document. This allows companies to share the original source that their composers or engraving software has produced without giving away any of their own magic sauce.

Finally, I strongly believe that the "transpose" property belongs to the Clef, and that it should either replace or work alongside the current "octave" property. I consider the "octave" property to be a specific transposition case that should not necessarily be treated differently. I also think the clef needs a smufl-symbol hint on it, but that's a separate issue. Edit: Or maybe to the Key signature. That would work too.

clnoel commented 5 years ago

@shoogle @jsawruk I want to explicitly state that there are two different cases here (at least). In one case, you are trying to make sure you preserve the original, in pitch-spelling and transposed-part, etc. In another case, you are trying to produce a new transposition (either turning the transposed part into sounding-pitch, or transposing the entire score).

In the case of a new transposition, there is no original document anymore, so there is no "right" or "wrong" pitch-spelling, just individual preferences and biases on behalf of the algorithm or composer. For instance, it looks like @jsawruk prefers to preserve note-letter distances between the original and the transposition, or possibly the harmonic values from the paper he linked about pitch-spelling, and I prefer to go the minimal-accidental route. Neither of those biases is wrong just different. And is why I don't think we should specify a transposition algorithm in MNX at all.

--Christina

jsawruk commented 5 years ago

@clnoel: Thank you. I think you have gotten to the core of the issue in identifying that there are two different transposition operations here. I completely agree that they are different use cases, and this is why I am opposed to the use of a transposition algorithm in MNX.

shoogle commented 5 years ago

A possible solution would be to store sounding pitch as a number, like a MIDI pitch, and then store two separate spellings, one for use in Concert Pitch and one for Transposed Pitch, or just store one spelling and say whether it is for Concert Pitch or Transposed Pitch.

For example, if storing Db4, you would have:

Pitch: 61 (could be C#, Db, Bx...)
Transposition steps: ...
Concert spelling: sharp (selects C#)
Transposed spelling: ...

Perhaps we could replace MIDI pitches with an octave number and a pitch within the octave (number between 1 and 12). Maybe we could enable other tuning systems by allowing files to specify a different number of notes per octave, and also the tunings of those notes.

clnoel commented 5 years ago

@mogenslundholm It looks like we kind of agree on what I call option "C2" in my post above: finding a way to represent both sounding-pitch and written-pitch (including specific spellings) in each note element.

I would like to add that I would like to do this in a way that gives easy short-forms to both the A case (no transposition) and the B case (only transposition).

@shoogle Do we want to have to specify midi pitch? If we do, that might make the odd tunings that @mogenslundholm is mentioning not work right, and it doesn't necessarily support the B case.

My proposed scheme. Each of these cases is encoding a sounding pitch of Middle-C on a Bb instrument: Case A: <note pitch="C4"/> Case A with "odd" spelling: <note pitch="Dbb4/>

Case B:

   <clef sign='G' transpose='-2'/>
   <note pitch="D4" />

Case C2:

   <clef sign='G' transpose='0,-2'/>
   <note pitch="C4,D4"/>

Case C2, where clef has an octave shift in the "concert pitch" version:

   <clef sign='G' transpose='-8,-2'/>
   <note pitch="Dbb5,D4"/>

Something like that, anyway. I also have no objections to adding +.5 or something to our pitch specifications to indicate quarter tones, but I feel that needs to be a separate discussion.

Edit: Note that in these cases the "transpose" value uses semitones as units, since I tend to think in MIDI notes. If some other system would work better, I'm open.

mogenslundholm commented 5 years ago

@clnoel: Sure, but isn't <note pitch="C4,C4"/> very short?

shoogle commented 5 years ago

As I mentioned before, the problem with storing two pitches is that they might not get used how you expect. The pitches may get out of sync, or they may be used to encode notes that are different between the parts and the score for reasons other than transposition, such as engraving errors. This is why it is better to store one pitch and two spellings, one spelling for use in the concert pitch score and one for use in the transposed parts, as I suggested in my previous comment.

@mogenslundholm, your example is short but it is not convenient. XML already has ways of storing multiple attributes so there is no need to combine two things in one attribute, and doing so just means you need to define a way to separate the two pieces of information on top of the XML standard. The proper way to save space is through compression. Anyway, let's not get too hung up on the syntax at this stage since we still haven't decided what it is we are trying to store.

@clnoel, the advantage of using a number to represent pitch is that it is abstract and not tied to any particular tuning or notation system. If you store "Db4+half-flat" that makes sense in a 24 tone system, but not in a 19 tone system where the pitch of Db itself is different to concert Db. It would be much safer to store the sounding pitch unambiguously, such as "MIDI pitch 60.5", "MIDI pitch 61 - 50 cents" or "octave 4, pitch 1 (out of 24)", which also has the advantage of taking enharmonics out of the equation. The enharmonic are specified separately by the spelling for concert pitch and transposed pitch.

jsawruk commented 5 years ago

@shoogle: If you want to store a numeric representation of a pitch, I might start with how music21 handles this through a pitch space number.

The central idea is this: the numeric representation is a floating point number where:

The integer part is the MIDI pitch number
The non-integer part is a fractional representation where 0.01 indicates one cent

One problem with this though is that even though 60 +50 cents should produce the same frequency as 61 -50 cents (?), but this can only be encoded in music21 as 60.5 as far as I am aware.

Perhaps a numeric pitch representation should also be broken into separate parts. For example: <pitch midiNumber="61" cents="-0.5" />. (This is just an example; I'm not sure what the best name for the cents property is yet. We can decide on a name later).

shoogle commented 5 years ago

@jsawruk, thanks for the suggestion, but I was only using MIDI pitches as an example for the sake of familiarity. In reality, MIDI pitches are unsuitable because they are non-semanitic outside of a 12 tone system:

As you showed, MIDI Pitch 60.5 is ambigious even in a 12 tone system because it could be 60 plus 50 cents or 61 minus 50 cents.
60 plus 50 cents is semantic only in a 12 tone system where it means raise the tuning of pitch 60.
- If you start using cents to identify pitches, as you would be forced to do in a 19 or 24 tone system, then the distinction between pitch and individual adjustments to tuning is lost.
- You could use something like pitch 60.5 plus 0 cents, but while that works in a 24 tone system it would get ugly fast in a 19 tone system.
- You could redefine the space between MIDI pitches so that 60 is still middle C, but now 61 is C plus a quarter tone instead of C plus a half tone, but this would be very confusing and would also lead to sub-optimum rendering in pitches far away from middle C.

So in reality octave=X, pitch=Y/N is the only suitable method if you want to support arbitrary tuning systems. The mapping to MIDI pitches would be specified elsewhere in the file as part of a <tuning> element.

jsawruk commented 5 years ago

@shoogle: I still think having MIDI pitch as a reference is preferred because it will be easier to migrate existing 12-ET content from MusicXML and MIDI files into MNX. I think we should support this backwards compatibility.

As far as non-12-ET systems, you could use cents, or you could use a smaller resolution. Dorico allows you to specify intervals using 12000 divisions per octave, which is 10x the resolution of cents.. This should provide sufficient resolution for any tuning system. It also uses rational numbers instead of floats, so you don't have floating point errors.

I think having a separate <tuning> section is probably the best idea to specify the tuning system in use, but I think this should be moved to a separate issue. I think the issue at hand is about pitch spelling, not tuning systems.

As far as a numeric pitch representation, I think we are getting very far afield. My initial proposal made no reference to numeric pitch representations; only to the pitch spellings. Adding a MIDI-based or MIDI-extended number would relatively simple, but I am concerned that "60" could take on multiple meanings depending on the tuning system. I'm not sure what the best resolution to this is, but now the numeric representation issue is just as ambiguous as the pitch spelling issue was.

This might be a good compromise:

<pitch name="C4">
  <midi-pitch number="60" />
  <accidental type="quarter-flat" />

  <tuning amount="-0.25" />
  <!-- OR -->
  <tuning ref="#tuning-24-ET" entry="C4 -quarter-flat" /> <!-- refers to a <tuning /> table elsewhere in the document -->

  <sounding-pitch transposition-amount="-2" name="Bb3">
    <midi-pitch number="58" />
  </sounding-pitch>
</pitch>

In this schema, <sounding-pitch> is an element that contains some of the same information that <pitch> does, but is independent. The elements <midi-pitch>, <accidental>, <tuning>, and <sounding-pitch> would all be optional.

shoogle commented 5 years ago

@jsawruk, the mapping for existing 12-ET content into octave + pitch number is entirely deterministic:

octave = (midiPitch / 12) - 1
pitchNumber = midiPitch % 12

This is assuming you want middle C (i.e. C4) to be note 0 in octave 4, but we could define things differently if we really wanted to. The point is that the conversion is simple and there is no issue with compatibility.

Let's not get hung up over whether to use cents or something else to represent tuning. The point is that we agree tuning needs to be represented somehow.

You say that tuning is a separate issue, but then you say:

I am concerned that "60" could take on multiple meanings depending on the tuning system.

so in reality they both must be dealt with here. Unless of course we want to consider this issue as "solved" based on the idea of storing one pitch + two spellings and open a new issue to discuss syntax?

Your example is far too verbose to be used for each note, and it suffers from the same problem as before in that you encode the same information in multiple ways, therefore opening up the possibility for abuse. However, something like your example could be used to define a key (perhaps within the global <tuning> element) that would allow you to simply use "C4" when encoding individual notes later in the file.

jsawruk commented 5 years ago

@shoogle:

Your example is far too verbose to be used for each note, and it suffers from the same problem as before in that you encode the same information in multiple ways, therefore opening up the possibility for abuse.

It is not too verbose because, as I stated before, most of that content is optional. It is syntactically valid to write <pitch name="C4" /> in the schema I proposed. The additional elements would allow someone to specify additional information if they so chose.
It is not the same information: written pitch and sounding pitch are different pieces of information. See this previous discussion in this thread.

clnoel commented 5 years ago

@jsawruk @shoogle @adrianholovaty OK, I might be showing my lack of Music Theory here, again, but I've got a basic question: Are tonal systems outside the 12 tone system meant to be included in CWMN? If they aren't, then we don't have to worry about them when encoding pitches. If they are, can someone give me an example of one of those systems (a link to a page describing it, or whatever) so that I can improve my understanding of the issue without bogging down this conversation?

That asked, I'm going to go back and review the last several posts again. Some of it seemed very complicated and I don't want to respond wrongly due to misunderstanding.

jsawruk commented 5 years ago

@clnoel That is a great question, and I don't know the answer. I agree with you that there should be some clarification on this. What I can say though is the following:

SMuFL encodes a lot of accidentals for alternate tuning systems, and it is my opinion that SMuFL and MNX should be well aligned. Does that mean that everything in SMuFL is absolutely CWMN? Of course not, but it is something to keep in mind.
A lot of musicians are familiar with quarter tone accidentals (24-ET), and these accidentals are supported directly in Sibelius and Dorico (and probably other notation software as well).

clnoel commented 5 years ago

@shoogle My proposal was not meant to store two pitches... it was meant to store two pitch spellings that represent the same sounding pitch. Sounding pitch (for midi/audio generation) would then determined by the consumer as a function of the pitch spelling and the clef/staff transpose (or transpose+octave if we don't want to combine those).

You are correct that people might decide to use this list-format to mean other things... and I'm not sure that that bothers me as long as the original intent is also preserved. I don't care if someone wants to use this to represent an original "erroneous" score along with a corrected score, as long as the differences can be represented by pitch spelling (not duration, for example). It just put extra items in the list. As long as all notes in the part have the same number of pitch-spellings in their list, we're good. Because realizations will specify which pitch-spelling "number" they use and they better be able to find it for all of them.

Maybe there's a better XML way of representing this list to enforce list-length across all notes? Hmm... looking into that.

I like the idea of creating a "tuning" element or elements that could create new pitch-spelling shorthands. Something to the effect of <tuning id="C-" midi-pitch="60-.5" spelling="C4<Smufl U+E280/>"/> to support a piece that has quarter-tones (if we decide not to support them explicitly). Then you write "C-" in the pitch of any note instead of the more complicated spelling, whatever it is.

--Christina

jsawruk commented 5 years ago

@clnoel:

You are correct that people might decide to use this list-format to mean other things... and I'm not sure that that bothers me as long as the original intent is also preserved. I don't care if someone wants to use this to represent an original "erroneous" score along with a corrected score, as long as the differences can be represented by pitch spelling (not duration, for example). It just put extra items in the list. As long as all notes in the part have the same number of pitch-spellings in their list, we're good. Because realizations will specify which pitch-spelling "number" they use and they better be able to find it for all of them.

I love this approach. I'm just not sure every note must have the same number of pitch spellings though. Perhaps it might be best to think of an alternate pitch spelling as an override or edit? Like "show version 2 of this score" would show the default pitches for notes which did not have a version 2 spelling, but would show the version 2 spelling for those notes that did. Does that make sense?

mdgood commented 5 years ago

Thanks for all the great discussion! I wanted to reply to a few of the points raised here.

MusicXML handles various microtonal systems by combining a wide range of accidentals with the 7-tone diatonic scale. This includes quarter-tones, makam music, and other microtonal systems supported by SMuFL. MNX-Common does the same and this is already included in the draft spec. Could we please continue that part of the discussion in issues #19 or #8? Figuring out written and sounding pitch is complicated enough without this separate issue mixed in.
There seems to sometimes be a conflation between sounding pitch and MIDI note numbers. I would advise that we steer clear of MIDI note numbers in any proposed solution to this issue, though they may be useful in solutions for different issues. MIDI note numbers already do dual duty as both a representation of notes on a keyboard for pitched instruments, and an index for a sound generator for unpitched instruments. In MIDI 2.0 these roles may well be expanded further. Neither of these roles seem appropriate for a pitch representation in MNX-Common, and tracking an evolving concept in the MIDI 2.0 standard seems to add undue difficulty to our task.
I disagree with Peter's assertion that MusicXML applications ignore exact positioning of elements. MusicXML is supported by over 240 applications, so there is a wide range of what applications do and don't do. Generalizations across all applications are usually pretty dangerous. In this case I know of several applications that rely on this exact positioning but need the semantics that MusicXML offers, so MNX-Generic is probably not good enough for them. A larger number of applications pick and choose which exact positioning information they use. We want MNX to modernize and improve the way this information is represented, but the distinction between "semantic" and "presentation" levels in music notation is not really clear cut.
I do feel confident in saying that one lesson from MusicXML is that forcing applications to encode two different things for what music looks like and how music sounds has usually caused problems. MusicXML does this for instance with element pairs like <type> / <duration> and <alter> / <accidental>. These have proven to be some of the most confusing and error-prone parts of MusicXML. MNX-Common's design has tried to eliminate all of these required multiple encodings. You only need to encode the second part of the data when it cannot be derived automatically from the first part of the data.

I would very much like to see that single required encoding approach kept for this issue of written and sounding pitch, no matter which we decide is primary. Representing transposition similarly to MusicXML's pair of chromatic and diatonic steps, combined with one or more settings for simplifying note spelling, should handle most cases. However choosing an enharmonic in a transposed part is often an aesthetic decision that is not strictly rule based. So the ability to override the standard setting seems necessary no matter if written or sounding pitch is primary.

clnoel commented 5 years ago

@mdgood 1) Thank you for the reference to where our current pitch syntax is. I had completely missed it. That clarifies what we support, and the syntax we should be using, and if it needs expanding or altering, that discussion belongs elsewhere. 2) I agree at this point that using MIDI pitch-numbers for our spec is not the way to go. I might continue to use it as a shorthand during discussion, but will stop using it in my proposed specs. However, is it therefore inappropriate to express a transposition as +3 semitones? 4) Agreed. I think I am reversing my original opinion, and now think that we need to express the written note to the exclusion of the sounding pitch. The sounding pitch can be determined from the written pitch-spelling, possibly with help from a transposition indicator on the staff or clef. If that is still insufficient for generating the correct sound, I think the <interpret> element should be used.

What you have not addressed is your opinion how to deal with source material that contains the same part in two different forms, such as both a conductor's full-score in concert pitch, and a player's clarinet part in A. Do you write the part in twice? Do you write it once, but double-list the pitches? (I have yet to hear an objection to the second except that a list's syntax can get tough.)

--Christina

shoogle commented 5 years ago

Thanks for the clarifications @mdgood.

Agreed.
Agreed, however I note that your comment was specifically about MIDI pitch numbers and that you did not discourage the idea of using numbers to represent pitch in general.
Thanks for enlightening us here. There are many applications that support MusicXML and I know that I personally am only familiar with the most common ones. As for what is "semantic" and what is "presentational", I think this depends mainly on whether MNX is being used as a native format for creating new compositions, or as an encoding/interchange format for storing fixed representations of historic editions. This topic seems to be a frequent source of conflict and probably worthy of its own issue.
I strongly agree that the format should not provide multiple ways to say the same thing.

@clnoel.

MusicXML expresses transposition as a number of diatonic steps (note letters) plus a number of chromatic steps (semitones). See this article for details of how this works. This may be the answer to our problem as it provides a way to specify spelling that is independent of key (see below).
The correct sounding pitch can indeed be determined this way, but it is insufficient to get the spelling for the concert score.

I do not object to your list proposal, and in fact I already suggested something similar as a way to encode differences between editions as well as differences between parts. However, it needs to be a list of spellings rather than a list of pitches, otherwise it will contain the same information twice. You need to pick one pitch (written or sounding) to be the primary pitch that gets both a name and a spelling, and the other pitches in your list would just get spellings. You just need a way to specify those spellings that is independent of key.

Diatonic Offset

Given a primary pitch that is either written or sounding, the other one can usually be recovered by applying the instrument's transposition in terms of diatonic and chromatic steps. In cases where this would give the incorrect spelling we can apply a diatonicOffset to increase or decrease the number of diatonic steps as necessary to recover the correct spelling for this particular note. diatonicOffset would be a property of each individual note, so this is the thing that would be stored in the list.

The system works regardless of whether written or sounding pitch is used as the primary pitch, but clearly we need to pick one of them. I think the sounding pitch is the "ground truth" that should be used as the primary pitch for the following reasons:

It preserves harmonies between parts even if spellings are later changed.
It means instruments can be changed (swap Clarinet in A for Clarinet in Bb) without having to rewrite all the pitches. If the files are compared this will produce the smallest diff.
If we store multiple editions (as well as multiple parts) in a single file then each edition can use its own spelling, but all must agree on the sounding pitch, hence it is the "ground truth".

mdgood commented 5 years ago

I wanted to hold off on offering my opinions on this issue until we had more discussion from the group, but now seems like a good time to share them.

I strongly favor MNX-Common representing pitch as it is written in the document being encoded. I like this proposal for score and part realizations because I believe it achieves this goal even better than MusicXML does. It allows for clear encoding of concert scores, transposed parts, transposed scores, and parts of unknown transposition.

MNX-Common is first and foremost a music notation format. It represents music notation that the musicians are reading. In that situation, the written pitch - whether concert or transposed, whatever is being read by that musician - is the ground truth. Not all pieces of written music have transpositions clearly indicated. Even some that do have had the meaning of those transpositions change over time (e.g. horn notation).

This ground truth differs between a concert score and transposed parts. Since MNX supports use cases where a single file can represent either one (e.g. MC9, RLP4, etc.), we have some choices to make. The realization proposal allows us to specify which is the primary source and which is the derived format on a case-by-case basis. But in any event, what is represented is a pitch that a musician sees, whether the conductor or an individual performer.

One issue that has not been discussed too much here is the readability of the MNX-Common files. I think a powerful ingredient of MusicXML's success is that the files are human readable. If you want to check what is going on, it is easy to compare what you see in a MusicXML file with what you see in a notation program. This is a great benefit for debugging, both when building an importer or exporter, or in large repertoire conversion projects.

MNX-Common should make this situation even better by making the syntax more concise. The use of microformats lets us represent the music more readably, with less XML syntax overhead getting in the way of the musical meaning.

I feel we will squander this advantage if we change from written to sounding pitch. It is not just readability but comprehensibility. If the pitches are representing sounding pitch, what about other items like stem direction and slur orientation? Are these for written pitch or sounding pitch? Either way is a problem: one way the data is inconsistently represented with each other, and the other way the data is entirely inconsistent with what a musician sees when dealing with a transposed score or a transposed part.

These are the types of issues that I believe @jsawruk was particularly concerned about. I agree with his concerns and feel that choosing sounding pitch would make the use case of maintaining a completely faithful visual appearance (RLP20) harder to achieve.

I agree with @shoogle that a major score editor would be unlikely to choose a transposed format as a native format. However, a native format for a major score editor is not an MNX use case. The MNX development use cases are for simpler web applications such as music display, playback, and education. For these use cases I do not see any hardship in choosing written pitch as the native format.

I also agree with @shoogle that choosing written pitch would make harmonic analysis applications more difficult. But such applications are not an MNX use case either. The analysis application that is an MNX use case is thematic search (ARL2). This can be handled equally well with either representation, as many if not most applications will want to search across all transpositions.

On the other hand, there are a few additional MNX use cases where choosing sounding pitch could be problematic. These include differencing (RLP19), distinguishing editorial inferences (ARL6), and representing original notation along with conventional understanding (ARL7).

In summary, I think that choosing sounding pitch over written pitch optimizes for use cases that MNX is not trying to solve. At the same time it causes problems both for several actual MNX use cases, for format readability, and for overall developer and publisher usability.

I think the original realizations and layouts proposal is a great way to maintain MusicXML strengths while fixing MusicXML weaknesses for a new generation of music notation applications.

jsawruk commented 5 years ago

I really appreciate your links to the MNX use cases. You have provided some compelling insight into the minutiae regarding this issue, and I think your comments can help us get closer to a group consensus.