Realizations and Layouts

w3c / mnx

Music Notation CG next-generation music markup proposal.

176 stars 19 forks source link

Realizations and Layouts #138

Closed joeberkovitz closed 4 years ago

joeberkovitz commented 6 years ago

This issue is a proposed direction that's intended to address (partly or wholly) the concerns of multiple existing issues including #4, #34, #57, #121.

At the moment I have not proposed any particular XML schema for these ideas, because I wanted to get the ideas themselves out on the table first. I don't think the problem of making up elements, attributes and relationships is going to be too difficult here, though.

Transposition And Its Discontents

For scores that can be consumed as both full scores and instrumental parts, the question of how to best encode pitch is confounding. Perhaps the two biggest forces driving MNX architecture are the desire to accurately encode the author's intentions, and the desire to accurately encode the material to be read by readers. When it comes to transposition, however, these forces are not necessarily in alignment. Perhaps the question of "which encoding is best?" is itself part of the problem, rather than part of the solution.

On the authorial side, a manuscript (whether autograph or digital) may have been originally notated in either concert or transposed pitch. Thus a decision to encode only concert pitch, or only transposed pitch, can impose an unacceptable distance between the encoding and the original material. Recall that MNX should serve to encode materials that may not have ever passed through a notation editor, with a reasonable degree of directness. Such original materials could differ as greatly as an orchestral piece notated in full at concert pitch, and a clarinet solo notated as a single transposed part. Should it not be possible to encode either one, such that there is a direct correspondence between the original manuscript and its encoded pitches?

If so, it seems hard to escape the conclusion that no single pitch encoding convention serves the goals of MNX well. Some of the many scenarios with pitch that may occur, include the following:

Original full concert score, derived transposed parts
Original full transposed score with identically transposed parts
Original full transposed score with derived full concert-pitch score
Single-instrument transposed part, with no need for a derived full score

It's also true that any algorithmic rule for conversion between pitch levels will sometimes need to be overridden by skilled editorial judgment. This doesn't mean that algorithms play no role, but it does mean that an override mechanism is necessary.

Finally, there is no single choice for pitch encoding that eliminates the need to convert between different pitch schemes. Implementors will have to deal with this complexity in at least one direction, and the conversion is basically symmetric in nature: it is not more complicated to go from A to B, then from B to A.

While it has been argued that concert pitch is a "canonical truth" that transcends transposed music, the only canonical truth we really have is whatever the composer wrote down -- which could be in either pitch scheme.

Score And Part Realizations

Looking beyond transposition, we find that parts and scores can differ in other semantic ways. Some directions or elements (e.g. cue sequences) are only visible in one or the other. Multi-measure rests may be present in a part and absent in the score, or vice versa. A textual direction might be shown in a teacher's guide and omitted from the student edition.

So it seems useful to situate the problem of score/part transposition within a larger landscape of allowing a CWMN document to vary for different roles. We therefore propose the new MNX-Common concept of a realization, which says how the document's contents are to be transformed for consumption by a particular role (e.g. conductor, performer, student, teacher, etc.). There are at least two major types of realization: a full-score realization, and part- specific realizations (one for each part).

Let's start by trying to roughly define a realization, and then look at how this definition works:

A realization has a list of included parts.
In a given realization, each part transposes its pitches a specified interval from concert pitch.
In a given realization, any measure may override the default key signature with a transposed enharmonic.
In a given realization, any note may override the default spelling with a transposed enharmonic.
Directions and sequences may be restricted to only occur in designated realizations.

There are two built-in kinds of realization, reflecting the main needs of producers and consumers: score (including all parts), and part (one for each part in the score).

Note that realizations don't concern style properties or system breaks or system spacing or credit placement or... all that other visual stuff... That realm is where layouts come in (see below). For example, a single part realization might have multiple layouts for different page or display sizes, each of which has different system breaks and credits positioning.

How Do Realizations Affect Encoding?

Each part specifies a source realization. The source realization of a part determines how that part's pitches are encoded. Because different realizations can transpose the same part differently, this means that pitches can be encoded either in concert or transposed pitch.

Let's look at several widely differing scenarios to illustrate:

In a document derived from a concert-full-score original (or exported from an application which uses concert pitch as a reference point), we'll have this scenario:

The score realization will specify concert pitch for each part (possibly with octave transposition for bass, piccolo, etc.)
Each part realization will specify the transposition for its specific part, along with enharmonic overrides.
Each part's source realization will be score, thus all notes will be encoded in concert pitch.

In a solo instrument score with a single transposed part as the original (or exported from an application which uses transposed pitch as a reference point), we'll have this scenario:

The sole part realization specifies transposed pitch for that single part.
The score realization (if it even exists) is identical to the part realization.
The single part's source realization is its part realization, thus all notes will be encoded in transposed pitch.
Consequently there do not need to be any enharmonic overrides.

In a document derived from a set of transposed parts we'll have this scenario:

The score realization will specify concert pitch for each part. (A full-transposed-score realization could exist also!)
Each part realization will specify the transposition for its specific part
Each part's source realization will be part and will be encoded in transposed pitch.
Each part will include enharmonic overrides for the score realization, as needed to support a presentation at concert pitch.

Transposing With Intervals And Semitone Distances

Like MusicXML, MNX must specify transpositions as diatonic intervals, i.e. as a combination of steps and semitones. However, as mentioned above realizations may also supply explicit key signatures and note spellings to override any prevailing transposition interval.

How Do Realizations Affect Presentation?

When rendering a part for consumption by a reader, a target realization is used. The problem of figuring out how to spell notes in a given realization is therefore as follows: how do we transform a note from its source to its target realization? The rough answer, to be refined in the actual spec, runs something like this:

If the two realizations are the same, do nothing
If the transpositions for source and target are the same, do nothing
If a key signature override exists in the target, use it. Otherwise transpose the source key signature by the intervallic difference between the target transposition and the source transposition.
If a note spelling override exists in the target, use it. Otherwise transpose the source note according to the distance in 5ths between the source and target key signatures.

Layouts

In comparison to realizations, layouts are fairly simple. They are ways in which a score may be presented. This is not just about full scores as opposed to parts. For example, a full score realization could itself be customized to be shown on an iPad, printed on A3 paper, or shown in an infinite scrolling strip. Each of these would constitute a distinct layout, based on the score realization of a document.

A layout is characterized by the following:

An underlying realization of the document (typically full-score or part).
Credit/header/footer text with spatial placement relative to display margins.
A stylesheet (i.e. score-wide class and selector definitions). This is useful to control the global appearance of the score (e.g. staff line spacing).
For this specific layout, layout-specific style property overrides can be applied to any element of the score. This capability allows measure style properties for system/page breaks to be scoped to a particular layout, among other things.
An optional display size range, used to automatically select the correct layout based on device characteristics. This would act similarly to CSS Media Queries.

notator commented 5 years ago

@mdgood Consensus is approaching! :-) Here's why: I think its important for MNX-Common to be easily convertible to SVG instantiations for use on the web. That means that a very strict distinction has to be made between spatial (=notation) and temporal (=performance) information. Spatial information can be instantiated as ordinary SVG graphics, but any temporal information is going to have to be instantiated in a special SVG namespace designed for that purpose. Thinking in these terms, the written pitch is graphical information, but any transposition information would be temporal. (Pitch is frequency.) It makes good sense to be able to leave transposition information out of MNX-Common (and use a default transposition in the SVG temporal namespace), but leaving out the graphic information makes no sense.

bhamblok commented 5 years ago

I strongly agree with @mdgood on "MNX-Common representing pitch as it is written in the document being encoded". On a side note: as implied here above, I would like to point out that MNX use cases should not be immutable. For instance, I would like to disagree on the statement: "a native format for a major score editor is not an MNX use case". That might be the case today, but hopefully, some day, that would not be true. For example: in early days of HTML, nobody could imagine the rich implementations it can do today. Shall I create a PR on the use cases of MNX for this example?

adrianholovaty commented 5 years ago

I believe there's an underlying philosophical question here, which we're not agreeing on. Do we want to...

Encode the ideas of a music composition — e.g., "First play C for a half note, then play D for a quarter note" — or
Encode pieces of paper in a semantic way — e.g., "Play these notes, and make sure they're displayed in this way."

Obviously it's not strictly one or the other. Music notation spans both — and, personally, that's a reason I find it very technically interesting.

But we ought to have a guiding light, a clear philosophical preference. My feeling, from watching Joe's evolution of the MNX ideas over the last few years, is that the philosophy is much closer to 1 than 2. The MNX-Common spec is deeply concerned with semantics.

It strikes me that the reasons people are giving for storing sounded pitch are in line with philosophy 1. The reasons people are giving for storing written pitch are in line with philosophy 2.

In my view, storing sounded pitch, not written pitch, is more in line with the MNX ethos. The main arguments, as I see it:

It's how we hear it. Regardless of instrument transposition, the concert pitch represents how the music is heard and experienced by the listener. I can understand the argument that "what the composer wrote on the page" is a ground truth — but "what the music actually sounds like" is the ultimate ground truth.
Transposition is an implementation detail. Here's where I see a parallel with Python 3's treatment of Unicode. In Python 3, a Unicode object is a semantically pure thing, representing characters conceptually — and it isn't until you read or write it than you need to specify which encoding it has (utf8 vs. window-1252, etc.) in order to get the raw bytes. Concert pitch is like Unicode; a pitch for a transposed instrument is like the Unicode object converted to a particular encoding.
It's the lowest common denominator. All instruments can be notated in concert pitch, whereas only some instruments are transposed. In a format whose primary function is interchange, we should have One Obvious Way To Do It whenever possible.
It forces the issue. If MNX used sounded pitch, document creators would essentially be forced to consider transposition when generating MNX. This means there's more upfront effort/thought, but I believe that's worth it in the long run — because it would result in unambiguous documents (at least in the area of pitch/transposition). Again, there's a parallel to Python 3 Unicode, in which developers at first had to swallow the bitter pill of needing to explicitly state which encoding their data was in; but in the long run, by forcing the issue, this prevents encoding bugs from happening down the road.

Some responses to specific comments:

@mdgood wrote:

Not all pieces of written music have transpositions clearly indicated. Even some that do have had the meaning of those transpositions change over time (e.g. horn notation).

I might be misunderstanding this, but... Doesn't this horn example serve as a solid argument for storing sounding pitch? If the music had been written in sounding pitch all those years, there would be absolutely no problems with misinterpreted transpositions.

@mdgood wrote:

If the pitches are representing sounding pitch, what about other items like stem direction and slur orientation? Are these for written pitch or sounding pitch?

This is a great question. It might be worth compiling a list of all notations whose positions depend on the pitch — stem direction, slur orientation, tuplet orientation, etc.

I do think, though, that this is a solvable problem. For each of these notations, we could let them have separate values depending on the transposition. Frankly, we might need a solution for that even if we use written pitch (e.g., providing the stem direction for the concert score).

@mdgood wrote:

One issue that has not been discussed too much here is the readability of the MNX-Common files. I think a powerful ingredient of MusicXML's success is that the files are human readable.

That is indeed an advantage of storing written pitch — but with free, high-quality MNX tools/viewers, needing to read the raw markup becomes less important. I would wager that, if there were a high-quality reference implementation available for developer testing, most developers would rather use that than eyeball XML.

dspreadbury commented 5 years ago

To declare myself, I very much agree with @adrianholovaty's views on this issue, and believe that sounding pitch is by far the preferable way of defining pitch in MNX-Common documents.

notator commented 5 years ago

@adrianholovaty wrote:

Obviously it's not strictly one or the other. Music notation spans both — and, personally, that's a reason I find it very technically interesting.

@dspreadbury wrote:

I [...] believe that sounding pitch is by far the preferable way of defining pitch in MNX-Common documents.

Maybe the solution is for neither solution to have preference. A note's graphics are actually independent of the value for its sounding pitch, so either or both can be defined independently. There are three cases:

Only the note's graphic pitch is defined: the sounding pitch would, by default, be the one normally associated with that notated pitch ("sounds as written"). In this case, further details about the graphics can be supplied in the MNX-Common code (stem directions, slur directions etc.).
Only the note's sounding pitch is defined ("written as sounding"): The graphics would be inferred by default from the sounding pitch. The instantiating application would then have to supply the details for the graphics.
Both the note's graphic pitch and sounding pitch are defined. Neither parameter needs to be related to the other. In particular, sounding pitch does not have to be defined as a transposition.

All things being equal, I'd tend towards keeping the MusicXML practice as far as possible so as to make the transition to MNX-Common easier for programmers, but coping with all three of the above options is probably not as difficult as it might look...

@mdgood wrote:

One issue that has not been discussed too much here is the readability of the MNX-Common files. I think a powerful ingredient of MusicXML's success is that the files are human readable.

Indeed! Human readability is the reason that XML is everywhere nowadays. I think developers will always want to be able to see that level of detail, even if tools are developed to provide shortcuts.

jsawruk commented 5 years ago

@adrianholovaty: I'm trying to find a way to bridge philosophies 1 and 2. As you said:

it's not strictly one or the other

From my perspective, I understand the advantages for both sounding pitch and written pitch. However, I have a very strong visual/display bias in my perception of MNX. Since I work with music notation on the web, I would like a format that makes it easier to display this content in a browser, which is an area where I personally find MusicXML to be deficient. I realize that this is only one use case, but I think it's an important one.

In MusicXML, specifying the layout of a page and how music is written is very important. This makes it a theoretically good interchange format between notation software, but not an ideal solution for web display. On the web, there is no concept of page, and given that there are various screen sizes and orientations, I personally would like to see a notation format be responsive in the way that HTML is. Some content authors will not want a responsive layout, but I think responsive layout is a valid use case.

I also think MNX should support stylesheets, which while not directly related to pitch display, is another reason for my visual/display bias. In my mind, this suggests that MNX is a visual format, much in the way that HTML is. HTML is not an interchange format: exporting a Word document to HTML then reloading it into Pages or Google Docs would probably not work too well. Instead, an interchange formation like RTF might work better. With that in mind, I personally see MusicXML as the interchange format, and MNX as the display format. I believe the two can co-exist, and we might want to take a step back and ask if we are trying to do too much with MNX.

@adrianholovaty : I really like your example of Unicode in Python, as this is exactly how I perceive an MNX realization. For example, I should be able to realize the MNX content in a different key, If I had a lead sheet that was written in C, but the user wanted to view it in Eb, the user/client would somehow create the Eb realization (I'm not sure whether this transposition is done via code or just an alternate representation in the same document, or a linked document). I think both of the philosophies you described above are just inverse transformations of each other: philosophy 1 being sounding -> written, and philosophy 2 being written -> sounding. If we decide to choose just one of these philosophies, then I think we also need to define how to perform these transformations within the specification so that all software uses the same approach, eliminating any potential ambiguity.

My personal thoughts are what I am calling philosophy 1.5: sounding and written pitch should have equal footing. I realize that this may lead to redundancy, but I believe it to be the least ambiguous way to convey pitch information. One possible solution to put sounding and written together might be to have both pitches on the same note; another might be to have multiple documents within a container file or a way to link related realizations together. This way, instead of taking the pitch representation and transforming it using an algorithm, a switch from written to sounding or vice versa is just a display of different data. This does put more effort on the content producers, but should simplify the implementation of MNX clients.

I understand that my opinion may not be popular, and that the group may end up choosing either philosophy 1 or 2. If those were my only choices, I would side with @mdgood, particularly because of the visual bias that I have expressed above and his comments regarding how objects like stems could change when transposed.

clnoel commented 5 years ago

@jsawruk wrote

With that in mind, I personally see MusicXML as the interchange format, and MNX as the display format. I believe the two can co-exist, and we might want to take a step back and ask if we are trying to do too much with MNX.

Musicnotes has an in-house music editor that supports our proprietary file format. We currently import music documents from other editors via MusicXML.... and as the writer and maintainer of our import process I can tell you that MusicXML is not ideal as an interchange format. Each major implementor of MusicXML exports does things slightly differently. One of the things we're hoping to get out of MNX is a more consistent result when importing a file.

The impression I have is that one of the ways we're going to get buy-in from the management of major editors for the implementation of MNX instead of (or in addition to) MusicXML is the assurance that when a file is exported from their program, another program can easily make it look just like it was when it was written in their editor. As such, although it pains me from a music-conceptual standpoint (where I would rather support sounding pitch and allow algorithms/realizations to determine the written pitch from there), I think we need to support written pitch primarily, but with the caveat that sounding pitch should either also be encoded or should be very easily determined from a set of parameters (like a semitone offset transposition specification).

jsawruk commented 5 years ago

@clnoel: I certainly agree that MusicXML is far from ideal as an interchange format, but my concern is that viewing MNX as both an interchange format and as a display format might be assigning too much responsibility to MNX. Should MNX replace MusicXML, or should it complement it? Can MNX be both an interchange format and a display format? I don't know. If MNX can also function as an interchange format, that would be great.

As I've said before, I would prefer a method where sounding and written pitch are both encoded. However, if the group does not decide that doing so is a viable solution, then I have also said that I would agree with you about using written pitch.

One of the things we're hoping to get out of MNX is accurate display of sheet music across a wide range of platforms. This includes notation software, and so MNX as an interchange format would be great, as it would align with our primary goal. However, since most notation software is proprietary, it may be very difficult to have all of the programs use the same code. One way to do this might be to create an open source library that MNX-compatible software could use. There will always be differences, and no interchange format is perfect, but this could be a start so that we don't end up with the varied implementations we have today.

notator commented 5 years ago

In my Update README pull request, I said:

I think fixed SVG score instantiations are an important use-case, and the easiest scenario to implement, but in the long term it might be possible to create scores that would, like HTML, re-flow in browsers. That's a good argument for developing MNX-Common as an abstract format, distinct from an SVG instantiation that would contain the same semantic information.

So I agree with @jsawruk that it would be wonderful if browsers would support re-flowable scores. But: That not only means having an HTML-like format for scores, it also means convincing the browser vendors to support it. I think that prospect is a long way off. I also agree with @jsawruk that MNX should support stylesheets. (They can either be ignored by a consuming application or copied into an SVG instantiation.)

I'd like to correct something in my previous comment:

Only the note's graphic pitch is defined: the sounding pitch would, by default, be the one normally associated with that notated pitch ("sounds as written"). In this case, further details about the graphics can be supplied in the MNX-Common code (stem directions, slur directions etc.).

Only the note's sounding pitch is defined ("written as sounding"): The graphics would be inferred by default from the sounding pitch. The instantiating application would then have to supply the details for the graphics.

Both the note's graphic pitch and sounding pitch are defined. Neither parameter needs to be related to the other. In particular, sounding pitch does not have to be defined as a transposition.

All these cases allow the consuming application to infer both the CWMN graphics pitch and the pitch frequency. So the instantiating application can read further details about the graphics (stem direction etc.) regardless of how the pitch's graphics were originally defined in the file.

@clnoel said

I think we need to support written pitch primarily, but with the caveat that sounding pitch should either also be encoded or should be very easily determined from a set of parameters (like a semitone offset transposition specification).

The advantage of not determining the sounding pitch as a transposition is that the written pitch can then be inferred, as in case 2 above.

mdgood commented 5 years ago

Thanks for the continuing discussion.

@bhamblok Yes, use cases can evolve over time. However we created these use cases to help guide us in making decisions such as this. This becomes difficult if the use cases are a moving target. I don't think anything has significantly changed since they were created to warrant updates at this point. From my analysis, choosing sounding pitch as primary causes more severe problems for more MNX use cases than choosing written pitch as primary.

@adrianholovaty I think the horn example demonstrates that the ground truth for notation is what is written. Transposition choices can be an interpretation of what is written. For a more contemporary example, educational content can have one written pitch that corresponds to many different sounding pitches. It depends on who is reading the music, playing which instrument. Forcing the issue seems a drawback, not an advantage.

I think it might be difficult to compile a list of notations whose position depends on pitch because, roughly speaking, it is all of them. Pitch changes can effect most everything in vertical layout. Enharmonic choices can effect horizontal layout as well.

For the first three 3 bullet points, I would flip the arguments so that written pitch is closer to the musical ideas from a musician's perspective. Perhaps one key distinction is that notation represents instructions for performance, not the performance itself.

In the first example about musical ideas, "Play C" seems meaningful from the performer's perspective only in written pitch. You could write horn or sax parts in sounding pitch, but the performer would likely not be too happy, and the results would probably not be what you expect to hear.

Written pitch seems the lowest common denominator for notation. All instruments can be represented at written pitch and that is the pitch the musician sees. Only some instruments have a sounding pitch that differs from the written pitch. With the Unicode analogy, written pitch seems to me more equivalent to the code point the musician interprets, with the musical instrument providing the conversion to the encoding / sounding pitch.

@clnoel I agree that no matter which we choose, we need to allow for enharmonic overrides in cases where the normal algorithms are not sufficient.

mogenslundholm commented 5 years ago

Fighting for sounding pitch but ...

Noting the disagreement in the group about this issue, I think we should reconsider the idea of having two pitches, sounding pitch and written pitch (mandatory).

Everyone wanting written pitch can be glad and only worry about the written pitch. Everyone wanting sounding pitch can be glad using sounding pitch. Totally independent - this puts the max pressure on notations programs - the program may not "help" you or "correct" you. We should specify that the pitch-pair must be preserved by the notation program.

Microtonalists can do whatever they want and the Turkish music and other folk music will have no problems specifying the accidentals for written pitch and the pitch for sounding pitch. (@mdgood sorry but ... isn't this problem solved in MusicXML?)

About being robust: Sure, MNX should be more robust, but isn't MusicXML robust? It seems to me that the problem is the notation programs: That whatever we write we cannot force the notation programmer to do what we want.

I like readability, however I think we should not fill the mnx-file with the words "written-pitch" and "sounding-pitch". Write it short - e.g. '<note pitch="B4-0.41,B4.half-flat"/>'. It is still readable.

The pitch-pair will always be right - because the note-writer with his notation program is responsible for the result and therefore it is right per definition.

jsawruk commented 5 years ago

Noting the disagreement in the group about this issue, I think we should reconsider the idea of having two pitches, sounding pitch and written pitch (mandatory).

@mogenslundholm: How does this differ from similar proposal I made earlier in this thread?

shoogle commented 5 years ago

I'm not sure about having two pitches. I think we all agree that written pitch is generally best for transcriptions while sounding pitch is generally best for new compositions, so why not simply allow either and have the user specify which one it is that is being used at the moment? This could be done through an explicit <pitch-type>written|sounding</pitch-type> tag, or implicitly via a second transposition interval as I originally proposed. This would allow us to get the best of both worlds.

I suppose we could allow two pitches as long as we allow the user to specify which one takes precedence (again, this would usually be based on whether the piece is transcription or a new composition). Two pitches could even be quite handy is in allowing the user to transpose a piece to a different key, while still preserving information about the original pitch and spelling. This could be part of a wider idea of allowing temporary edits to be stored in the same file as the original document, a bit like saving annotations to a PDF or filling in PDF form. You can choose to drop the edits and any time and reset the document to its original form.

jsawruk commented 5 years ago

@shoogle

I think we all agree that written pitch is generally best for transcriptions while sounding pitch is generally best for new compositions

I definitely do not agree with that statement. Please avoid making generalizations like that since there are many different viewpoints on this issue, and the group has not yet reached consensus. I personally think that the issue of pitch representation is independent of whether the composition is new or not.

we could allow two pitches as long as we allow the user to specify which one takes precedence

I personally think this would increase ambiguity. I believe the specification should dictate which pitch takes precedence (if the specification allows more than one way to represent a pitch).

notator commented 5 years ago

@jsawruk said

...(if the specification allows more than one way to represent a pitch).

I think the specification should allow more than one way to represent a pitch (see above).

... the specification should dictate which pitch takes precedence.

I don't think its really a question of precedence. Its a question of which defaults are assumed when one or other of the pitch definitions is missing.

mogenslundholm commented 5 years ago

I don't think it differs. I like your description "completely orthogonal". Maybe a minor difference, that I think the pitch pair should be mandatory. If sounding pitch may be omitted, then it will not be there. (and it is only a few characters).

On 2019-04-03 01:26, jsawruk wrote:

Noting the disagreement in the group about this issue, I think we
should reconsider the idea of having two pitches, sounding pitch
and written pitch (mandatory).
@mogenslundholm https://github.com/mogenslundholm: How does this differ from similar proposal I made earlier in this thread https://github.com/w3c/mnx/issues/138#issuecomment-457887244?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/mnx/issues/138#issuecomment-479252416, or mute the thread https://github.com/notifications/unsubscribe-auth/ADd7T3YKg9ArdY88cY7Zs9G-biH_fZ5-ks5vc-cogaJpZM4U9W7i.

mogenslundholm commented 5 years ago

Only one way to do it means that this way is tested all of the time. Any choice will cause future errors and more testing..

jsawruk commented 5 years ago

@notator: I was only providing a counterargument to this one posted by @shoogle. He proposed a method where the end user could decide whether written or sound pitch would take precedence. I personally do not think that's a good idea, so my comments about pitch precedence only refer to this proposal.

As I have said before, I think the specification should specify both written and sounding pitch, with either a) both mandatory or b) written mandatory and sounding optional. If the specification is written in such a way that only one representation is supported, then I would be in favor of written pitch.

shoogle commented 5 years ago

Let's say I'm am writing a passage where Bb trumpet and C flute are to play in unison. In this case my priority would be to preserve the harmonic relationship between the two, so I would choose sounding pitch to take preference. I might want to specify the written spelling too, but that would only be valid in the initial transpositions (Bb and C respectively), whereas choosing the sounding pitch to take precedence allows the the harmonic relationship to be preserved when transposing the entire score to other keys, or when swapping instruments. Imagine if swapping the Bb trumpet for a C trumpet gave a different set of spellings to those in the C flute part; this is a real possibility if written pitch takes precedence, but it could not happen if sounding pitch takes precedence.

Equally, when transcribing an existing piece for Bb trumpet and C flute, I would (probably) want to record the spellings exactly as given in the existing score. In this case written pitch would take precedence, since that is the only information that was given to me by the composer.

So there are situations where it makes sense for either to take precedence, so it should be possible for the user to specify which one is important to them in the current situation.

jsawruk commented 5 years ago

@shoogle:

Imagine if swapping the Bb trumpet for a C trumpet gave a different set of spellings to those in the C flute part; this is a real possibility if written pitch takes precedence

I'm sorry, but I don't understand this process. Under what circumstances would change the Trumpet from Bb to C change notes in the Flute part? It is my opinion that this should never happen, as the Trumpet and Flute parts should be completely independent. If you change from a Bb Trumpet to a C Trumpet, what would cause the Flute part notes to change? Could you please provide an example?

shoogle commented 5 years ago

@jsawruk

Under what circumstances would change the Trumpet from Bb to C change notes in the Flute part?

Never! I would hope that much was obvious!

The point is that the two instruments are in unison, so if you swap the Bb trumpet for C trumpet, the newly transposed notes and spellings in the C trumpet part should match the (unchanged) notes and spellings in the C flute part. However, if ~sounding~ written pitch takes precedence and is used as the basis for transposition then there is no guarantee that the two lines will match.

jsawruk commented 5 years ago

@shoogle: I'm sorry, but I'm having a really difficult time understanding your position. It seems like you are arguing against yourself.

You support encoding notes as sounding pitch, and have said so multiple times.
The example you recently provided presents a use case where written and sounding pitch differ (a common use case)
You then say:

However, if sounding pitch takes precedence and is used as the basis for transposition then there is no guarantee that the two lines will match.

To me, this appears to support written pitch as opposed to sounding pitching, as you yourself stated that sounding pitch would provide no guarantee. How would such an example support your position of using sounding pitch as a basis for representation?

shoogle commented 5 years ago

@jsawruk, sorry, that one was a typo. I meant to say written rather than sounding pitch and I have now amended to post to that effect. However, the previous one that you took issue with was correct and I think you simply misread it, or misunderstood what I was trying to say.

jsawruk commented 5 years ago

@shoogle: Yeah, sorry, but I still don't understand how using written pitch as a representation would cause the Flute notes to change when transposing the Trumpet part from Bb to C. Could you please provide a musical example, code fragment, or other more detailed description of this process?

I am assuming the following:

All parts are independent.
Transposing one part does not cause any other part to transpose. If I transpose a Trumpet part, then the Flute part should not change at all, because they are independent. Similarly, if I switch from "Transposed Score" to "Score in C" or vice versa, then only the parts that correspond to transposing instruments should change, and they should change independent of each other.
Pitch representation does not change the independence of parts, regardless of what pitch representation is used (written, sounding, or combined).

Which of my assumptions is incorrect or differ from your model? I really want to understand your position because I am proposing the position of supporting both sounding and written pitch. I want to make sure I understand the consequences of each. It seems that most people are either very firmly in the sounding pitch camp (such as yourself), or in the written pitch camp. I strongly believe that there is a compromise solution, but so far my idea has not taken off.

shoogle commented 5 years ago

@jsawruk, I will borrow the example you gave before for Flute and Clarinet in A.

Step 1: The user composes a phrase where Flute and Clarinet are to play in unison (same sounding pitch).

Step 2: However, the Clarinet part is to be written in A, so the user transposes the phrase up a minor third and specifies the correct spelling for this transposition.

Step 3: The user now wishes* the Clarinet part to be written in C, so the part written in A must be transposed back down a minor third. However, doing so does not necessarily yield the same spelling as was used for the Flute part.

As you can see, the notes for Flute never changed; it is the notes for the transposing instrument (in this case the Clarinet) that have changed. The point is that the Clarinet notes are different to the Flute notes even though the instruments are in unison.

The solution to this problem is for the user to be able to specify (in step 2) that, in this case, sounding pitch should take precedence over written pitch. This means that when any part, or even the entire piece, is transposed to other keys (as in step 3), it is the sounding pitch that will be used as the basis for calculating the new transposition.

* Why would the user want to switch the instruments?

The composer has changed his/her mind
An orchestra lacks the required instrument

Alternatively, the same problem could be caused by any of the following:

The user copies and pastes notes between instruments of different transpositions
The user copies the phrase to use later in the piece but in a different key
A conductor wishes to see the score written in C

If we are to allow both a written and a sounding pitch to be specified, then we need to allow the user to specify which one takes precedence when calculating a new transposition. Either that, or we simply assume that sounding pitch takes precedence unless only a written pitch is provided.

jsawruk commented 5 years ago

First of all, I think introducing user choice into this process either:

Is out of scope (at least for this issue), since we are trying to develop a pitch representation, not define user interaction. User interaction is important in making a decision regarding pitch representation (the examples of why the user would want to change instruments are appropriate use cases to consider), but I don't think having a user decide precedence between sounding and written pitch is an appropriate use case for the current issue. It seems more like a design decision in an MNX authoring tool OR
Introduces unnecessary complexity

Secondly, I still don't understand why sounding pitch should ever be used as the basis of transposition. I assume the following two axioms:

Any written pitch maps to one and only one sounding pitch
Multiple written pitches may map to the same sounding pitch

Given that, I can create a function that maps any written pitch to a sounding pitch. However, that function does not have an inverse. For example, C#4 maps to MIDI 61. Db4 also maps to the same value, MIDI 61. Now, if the only piece of information I have is "MIDI 61", how can I tell which written pitch produced that frequency? I can't, unless I have more information, since either note could have mapped to MIDI 61.

The inability to invert the written -> sounding function is the basis for my argument against using only sounding pitch. If I only have sounding pitch information available, I am not guaranteed that I can recover the correct written pitch. I may be able to recover a "preferred" written pitch by using a pitch spelling algorithm, but doing so would be post hoc and still not guarantee complete accuracy. You could argue that additional information could be passed to the transposition function, such as musical interval (encoded not as an integer, but as an interval like P5; see below), and/or additional context. Again, this could improve results, but accuracy cannot be guaranteed.

The music example above shows exactly how this process fails. The process converts the sounding pitch (Cbb) into the written pitch by applying the transposition "+m3". This note should then technically be Ebbb, but since there are no triple flats, the software must respell this pitch. The algorithm chooses Db because 1) it's enharmonically equivalent and 2) because it has a flat, so the algorithm choses Db over C# since the original pitch was flat. However, when this process is inverted, the transposition "-m3" is applied. Db + (-m3) = Bb. Though Bb and Cbb are enharmonically equivalent, they are not written the same way. Bb is easier to read and might actually be preferred, but the "C" quality of the original sounding pitch is lost.

The more I think about this problem, the more I realize that the issue might not be just about pitch representation at all. I think we should also be discussing interval representation. In MusicXML, transposition intervals are stored as numeric quantities, either integers or decimals. I think the problems with converting between written and sounding pitch might actually be that the representation of the transposition interval is insufficient. It is my personal opinion that a different interval representation might be more useful. As a contrived example, the intervals A4 and d5 are the same interval, 6 semitones. However, I think that A4 and d5 are different intervals. C +A4 = F#, but C +d5 = Gb. This is basically the way music21 handles transposition. I think we might want to consider something similar. Should discussions about interval representations continue here, or broken off into a separate Github issue?

shoogle commented 5 years ago

@jsawruk

I still don't understand why sounding pitch should ever be used as the basis of transposition. I assume the following two axioms:

Any written pitch maps to one and only one sounding pitch

Multiple written pitches may map to the same sounding pitch

The first axiom is only true in the sense of pitch as frequency (i.e. sound waves), but that is not what we are talking about here. What we're actually talking about is concert pitch (i.e. the pitch given in a C score).

If you wanted to be picky, you could think of concert pitch as like a special kind of written pitch where the transposition interval is zero (i.e. a perfect unison). However, for all intents and purposes it is equivalent to sounding pitch.

Since concert pitch has to be written on a staff, this means you do need to make decisions about spelling (e.g. A# vs. Bb vs. Cbb, as my previous example showed). Therefore, some MNX users will want to be able to specify the spelling separately for concert pitch and transposed pitch, to ensure that their preferred spelling is used in each case.

However, some users may prefer to only give one kind of pitch:

Composers tend to think in concert pitch. They would be happy to leave transposition up to the computer, at least while they work on the piece (they might want to specify transposed spellings in a final step once everything else is done).
Transcribers only have whichever pitch is given in the edition they are copying, which tends to be a written pitch. They would prefer to enter only the pitch they have in front of them, rather than guessing which concert spelling the composer may have intended (though transcribers of an editorial edition may indeed want to provide a suggested concert spelling).

These users may wish to provide only one kind of pitch, and leave the other kind deliberately unspecified. However, if the format mandates that both kinds of pitch must be given, then these users would want to specify which is to take precedence.

Composer: "The concert/sounding pitches are mine and should take precedence. Transposing pitches are given for Bb Trumpet for the sake of convenience, but the line could be played equally well on any kind of trumpet. I'm not a trumpeter, and the transposed pitches were calculated for me by Program X, so I give no guarantee of their suitability. If in doubt, and when changing the transposition, refer to the concert pitches."
Transcriber: "The transposed pitches are Beethoven's and should take precedence. Program Y helpfully inserted concert pitches too, but unfortunately Beethoven isn't around to tell us whether these are correct. If in doubt, and when changing the transposition, refer to the transposed pitches."

Your suggestion to use intervals rather than absolute pitches is interesting. I agree that intervals are a superior way to represent music:

It better represents how we perceive music. Most people cannot tell when a melody is played in a different key to the one it is traditionally played in.
It means that transposition is a simple as changing the starting note. This essentially removes the whole issue of sounding vs. written pitch (very nearly anyway).

However, I fear it may go beyond the scope of CWMN, which seems to be very much wedded to the idea of an absolute scale, and it would make the MNX syntax much harder to read for those of us who are used to pitches rather than intervals.

Nevertheless, you can count me in favour of this idea since the vast majority of users will never look at MNX outside of an application that is capable of doing the conversion automatically.

jsawruk commented 5 years ago

@shoogle: I know what concert pitch is; I am a composer. I still think the first axiom holds for concert pitch. Here's why:

Define a transposition function T(pitch, interval), which transposes a given pitch by the given interval
Assume that the function is deterministic with respect to its inputs: It always produces the same output for the given pitch, interval input pair
Assume the function only outputs standard accidentals (e.g. not triple flats)

Such a function is then not invertible, as my example above shows. While T(T(pitch, interval), -interval) = pitch in a lot of cases, it doesn't in every case. I can prove this by the counterexample I provided above.

Does it matter that this function is always invertible? Does it matter if this function isn't always correct? Are there better ways to define a transposition function? (almost certainly). But most importantly, is it within the scope of MNX to define such a transposition function?

It's my personal opinion that if only written or sounding pitch is decided as the primary representation, then the MNX standard should also specify a transposition algorithm to convert to the other form. I am concerned that trying to standardize such an algorithm might also fail to produce consensus.

Also, I am not suggesting replacing pitches with intervals. Instead, I am proposing an alternative way to indicate how an instrument transposes. In my mind, saying that Bb Trumpet has a transposition interval of "sounding down a major second (-M2)" is more informative than the numeric representations used in MusicXML.

shoogle commented 5 years ago

I don't see how any of the maths disproves anything I said above. You have proven that you can create an algorithm that obeys the axiom, but you have not proven that the axiom holds true for all possible algorithms, so you cannot guarantee that the pitches it returns will agree with those chosen by the composer, as my example demonstrated.

I think we're going around in circles now. You say we must use transposed pitch because the algorithm cannot recover it from a concert pitch. I say that the argument is equally true in reverse. It all depends on perspective: whether you consider concert pitch or transposed pitch the "ground truth". As I showed above, this depends on each user's individual requirements.

It's my personal opinion that if only written or sounding pitch is decided as the primary representation, then the MNX standard should also specify a transposition algorithm to convert to the other form.

It depends whether by "primary representation" you mean the exclusive representation (only one kind of pitch is ever used, and it is always the same) or you mean that both are specified and the primary one (whichever it may be) takes precedence. If both are specified then there is no need to specify an algorithm.

I think the argument basically boils down to this:

If MNX wants to be an interchange/encoding format then it will use transposed pitch.
If MNX wants to be viable as a native/working format it its own right then it will use concert pitch.
If it wants to be used as both then it will allow both kinds of pitch.

Also, I am not suggesting replacing pitches with intervals.

That's a pity. I think it is worth consideration, though probably not in this thread.

Instead, I am proposing an alternative way to indicate how an instrument transposes. In my mind, saying that Bb Trumpet has a transposition interval of "sounding down a major second (-M2)" is more informative than the numeric representations used in MusicXML.

I think (others will correct me if I'm wrong) that intervals are completely expressed by the current representation used in MusicXML, so while your new approach may be easier to read, it doesn't actually add any additional information, and therefore doesn't help with the issue of sounding vs. concert pitch.

notator commented 5 years ago

I'd like to make a proposal that solves the above dilemma, and some more, but it involves digging a bit deeper into what MNX is supposed to be aiming at. I want to get to the point as quickly as possible, so am going to present this from first principles. Please bear with me. The connection to the above discussion will emerge.

First principles (applicable to all music notations, not just MNX-Common):

Space and time are, at least in human perception, hermetically separate domains. We use different units of measurement to describe them. One learns in primary school not to mix up apples and oranges...
All MNX (music notation) files should be able to contain linked spatial and temporal information. If that information is to be preserved in an instantiation, then the instantiation must also be able to store linked spatial (=graphical) and temporal information. On the web, the obvious candidate for such a format is SVG that has a specialised namespace for containing the temporal info. (I'm currently working on a proposal for a revision of §1-§4 of the MNX Draft Specification in which I call this special format MusicSVG.)
Pitch frequencies are temporal. Many of the world's music notations need precise microtonal pitch descriptions, so MNX is going to need a simple notation for microtonal frequency. I'd like to propose the use of MIDI-cent notation. This is a MIDI pitch number (an integer) followed by two decimal places representing the number of cents above the base MIDI pitch (e.g. midiPitch 60.5 would be the pitch a quarter-tone higher than MIDI 60; midiPitch 60.25 would be an eighth-tone higher than midiPitch 60; etc.). The smallest discernible interval seems to be around 5-6 cents, so cent accuracy seems good enough. ( Wikipedia describes the equivalence between the basic MIDI note numbers and frequencies. But see how easy it is to start getting confused: MIDI, which was defined in the 1980s, at a time of great confusion in the world of notation, makes no distinction between "note number" and "pitch number". Notoriously, it also makes no distinction between "quarter-note" and "beat"... )

Common Western Music Notation uses the concept of transposition. The unit of transposition is always a whole number of semitones. We don't need microtonal transposition values. (Edit: It turns out that microtonal transposition values need to be discussed. See below, and maybe elsewhere.)

The key to simplifying the MNX-Common syntax is to provide a clear set of defaults. For example:

The default frequency for a written middle-C is MIDI 60
The default transposition is 0 semitones
The default symbol for MIDI 60 is a written middle-C.

The transposition value can change at any time during a part (<sequence>), so it should be a (single-ended) <direction>, not a <note> attribute:

<measure index="1">
  <sequence>
    <directions>
      <transposition value="3" location="1/2"/>
    </directions>
    <event value="/2">
      <note pitch="C4"/>
    </event>
    <event value="/2">
      <note pitch="C4"/>
    </event>
  </sequence>
</measure>

Here, an application that instantiated the frequencies would interpret the second half-note C4 as sounding 3 semitones higher than the first one (which has the default frequency MIDI 60). And, since the transposition would be a "single-ended" <direction>, it would apply until further notice.

The pitch values in the above example always denote the written pitch. Using properly defined defaults, it would be equally possible to define the MIDI frequency instead:

<measure index="1">
  <sequence>
    <directions>
      <transposition value="3" location="1/2"/>
    </directions>
    <event value="/2">
      <note midiPitch="60"/>
    </event>
    <event value="/2">
      <note midiPitch="63"/>
    </event>
  </sequence>
</measure>

In this case, the first half-note would be written as a C4 by default, while the second would be written as a C4 because 63 minus the transposition is 60. It would also be legal to supply a cent value for the midiPitch (e.g. midiPitch="63.33"). Applications that can't deal with microtones would simply display the symbol for the nearest semitone. Applications that care about precise visuals, would write the pitch value.

It would also be legal to provide both written pitch and midiPitch for a particular note, overriding any transposition:

<measure index="1">
  <sequence>
    <event value="/2">
      <note pitch="C4" midiPitch="60"/>
    </event>
    <event value="/2">
      <note pitch="Dbb4" midiPitch="60.1"/>
    </event>
  </sequence>
</measure>

In all these cases, the pitch attribute always describes the way the note looks (is written, in space), while the midiPitch attribute always describes the way it should sound (in time) when the score is played. That ought to make it easier to read, write and debug MNX-Common. Hope that helps.

mogenslundholm commented 5 years ago

Transposition can be made unambiguous, assuming that you know the keys and you are able to set another accidental. /Mogens

jsawruk commented 5 years ago

@shoogle:

but you have not proven that the axiom holds true for all possible algorithms

First of all, that's not what axiom means. Axiom means assumption. I don't need to prove anything about an axiom, rather I can prove things given a set of axioms. If you disagree with the axioms that I am proposing, then that's fine. I only am using them to show my thought process and how I reach my conclusions.

As far as proving something for the set of all transposition algorithms, I don't think that's possible. I am assuming that there is a set transposition algorithms that are monoids (associative, identity, non-invertible), and a set of transposition algorithms that are groups (associative, identity, invertible). For example, transposition using MIDI pitch numbers and a numeric transposition amount forms a group: T(60, 3) = 63, and T(63, -3) = 60: -3 is the inverse of the 3 transposition in this case. However, when dealing with pitch strings and interval strings, this might not always work: T(Cbb, +m3) = Db, but T(Db, -m3) = Bb. In this case, -m3 is not the inverse of +m3. Since these are different algebraic structures, I don't know how to prove a result for all cases. I could maybe produce two separate proofs, one for each type of algorithm, or perhaps I could prove a result using category theory. However, I don't think doing so would be helpful to anyone, so I won't pursue such proofs at this time.

It depends whether by "primary representation" you mean the exclusive representation (only one kind of pitch is ever used, and it is always the same) or you mean that both are specified and the primary one (whichever it may be) takes precedence. If both are specified then there is no need to specify an algorithm.

By primary representation, I meant written pitch OR sounding pitch OR written and sounding pitch together. Specifying both would mean we don't need to specify an algorithm, and that is the primary thesis of my argument, so I am glad you are understanding my position.

Also, I am not suggesting replacing pitches with intervals.

That's a pity. I think it is worth consideration, though probably not in this thread.

Intervals and pitches are different concepts. We should probably discuss intervals in a different thread, but I cannot support a position of eliminating pitch.

so while your new approach may be easier to read, it doesn't actually add any additional information

It does convey more information because "+6" is ambiguous. C up 6 semitones = ? It could be F#, or it could be Gb. C up an augmented fourth, however, is always F#, and C up a diminished fifth is always Gb.

@notator: I think you make some good points, and I agree with virtually all of them, but I think microtonal transpositions should be included. I can't think of any use cases off hand, but specifying transpositions using a decimal is something MusicXML already supports, so I think it makes sense to also support it in MNX.

notator commented 5 years ago

@jsawruk: Yes, you're right. Microtonal transposition values should be allowed. A use case would be the global adjustment of playback to a base frequency that is not A=440Hz.

clnoel commented 5 years ago

The one thing I want to make sure of is that there aren't two mutually-exclusive ways to do this. I don't want a thing where we can say both concert-pitch and written-pitch are optional, but you have to have one. (Or pitch and midiPitch, or whatever we end up calling the two attributes!) One has to be required, and the other can be optional to provide additional info. I'm pretty sure that this means that at least one use case is going to be made harder to deal with, but I find that to be an acceptable downside. The fact that there are so many optional ways to do things in MusicXML is one of the reasons I really hate dealing with it.

@notator I support the idea of a transposition as a direction. That makes a LOT of sense. This direction can cover a lot of cases, such as a clef change, or an 8va notation or several other visible marks on the page, and I think that the relationship between a transposition direction and those other visible marks on the page should be discussed in a separate thread.

shoogle commented 5 years ago

@notator, please do not discuss MusicSVG in this issue. I can see it may be useful in some situations, but not in others and the group has made it clear that they don't want to go down this route. I suggest you propose it to an open source project such as MuseScore, Audiveris, or Lilypond and see if they are willing to take it forward, or to accept code contributions in that area. MuseScore already has a somewhat customized SVG exporter, so I think the idea of adding semantic attributes as an optional feature will not prove overly controversial. Please refrain from bringing it up here again except in a dedicated issue.

Back to the issue at hand, if you like MIDI then you could store pitch as three separate quantities:

a MIDI pitch number (which is spelling agnostic)
a concert spelling (tonal pitch class)
a transposed spelling (tonal pitch class)

This is how MuseScore stores pitch (the transposition interval is stored separately as a property of the staff). This method gives equal prominence to written and sounding pitch, thereby avoiding any controversy. However, MuseScore currently does not support microtonal scales and I'm not sure how easily this method could be extended to support them. (Presumably it could be done by adding new pitch classes, though it may assume equivalence of certain classes, like C# and Db, that may not be true outside of 12-TET.)

@jsawruk

It does convey more information because "+6" is ambiguous. C up 6 semitones = ? It could be F#, or it could be Gb.

That is not how transposition works in MusicXML. Transposition in MusicXML is not just given as a number of chromatic steps (semitones), it is also given as a number of diatonic steps. These two numbers together allow you to calculate the interval and recover both pitch and spelling unambiguously. If you want to specify the interval explicitly then it might make things clearer (perhaps somebody would like to say why that was not done originally in MusicXML) but it would not add any new information.

I think microtonal transpositions should be included. I can't think of any use cases off hand, but specifying transpositions using a decimal is something MusicXML already supports, so I think it makes sense to also support it in MNX.

I support microtonal tuning. I have never heard of microtonal transposition (except in the MusicXML spec). I think we should keep tuning and transposition separate, unless somebody can provide a real life example of when microtonal adjustments should definitely apply to transposition rather than tuning.

@notator

Microtonal transposition values should be allowed. A use case would be the global adjustment of playback to a base frequency that is not A=440Hz.

Your example refers to tuning, not transposition.

notator commented 5 years ago

@shoogle:

Your example refers to tuning, not transposition.

Yes, I was a bit hasty in replying to @jsawruk. I have no objection to calling "the global adjustment of playback to a base frequency that is not A=440Hz" tuning, but I can't find this setting in the current draft spec, and am not sure where it should go in the MNX-Common file. It could simply be an attribute of <mnx-common> that redefines the frequency (Hz) of a written A4 (default 440).

<mnx-common A4="431">
...
</mnx-common>

I can't think of any other use cases for microtonal transposition. Even if such use cases exist, they must be rare, and there is an alternative way to achieve the same result: simply define both pitch and (mictrotonal) midiPitch (or whatever we call these things) on every note in the part. I agree with @clnoel that its a good idea to avoid having two ways to do the same thing (unless there's an exceptionally good reason), so the bottom line is that I now think my original instinct was right: the transposition direction should be limited to whole numbers of semitones.

@clnoel I think any music notation standard has to be able to describe both a symbol's appearance and what it means, so an apparent redundancy is inevitable. But that's not to say that we necessarily have to allow midiPitch to be defined without defining pitch. My current feeling is that all the alternatives I described above should be allowed, but I fully agree that this should be discussed thoroughly in a separate issue. The attribute names pitch and midiPitch also need discussing...

And here's another issue (Maybe its time this issue was split up?): We need to discuss the representation of arbitrary microtone symbols in MNX-Common. §5.2.2.4 of the draft spec defines four ways to name symbols for quarter-tones. I think these definitions

violate @clnoel's principle of non-duplication.
confuse space with time (they imply precise tunings where none exist)
are unnecessarily restricting

Applications should be allowed to use symbols that are as differentiated as they like. Maybe some applications will want, in some situations, to use enharmonic spellings for the same non-ET tuning. In other words, I think applications should be enabled to be as "precise" as they like when creating symbols for microtonal tunings. Some (many?) applications will just support the standard CWMN accidentals (bb, b, n, #, ##). Others will implement basic quarter-tone symbols. Yet others may want to use specialised symbols for other tunings.

A possible solution would be to have a "wildcard" addition to the symbols defined in §5.2.2.4. This would be of the form diatonicnotehead-height, "a" =any (or some other character) for the wildcard, and a number for the octave. Examples: Aa4, Da2 etc. The code

<note pitch="Ca4" midiPitch="60.4"/>

would tell the client application to draw the notehead at C4 together with an accidental (or no accidental) that is the best match for the given frequency. If the app does not support microtone accidentals, this would result in an ordinary C4 symbol because 60.4 is closer to 60 than to 61). If the app supports quarter-tone notation, it can use the nearest quarter-tone accidental (whose valid range might be from 60.33 to 60.66). If the app supports other accidental types, it can use those. Another example might help:

<note pitch="Da4" midiPitch="60.4"/>

An app that does not support microtone accidentals would interpret this as a double-flat (because 60.4 is less than 61 (=default Db frequency). An app that does support microtone accidentals might still interpret this as a double-flat, if it had no microtone accidental corresponding to a D4 at that frequency.

As I said, this needs discussing fully in a separate issue.

jsawruk commented 5 years ago

@notator: I agree that there appear to be multiple issues in this thread, though I'm not sure how best to split them. As far as global tuning (like A4=431), I think that should be separate from this issue if it isn't already.

mogenslundholm commented 5 years ago

@notator : About "An app ... would interpret this as a double-flat". Consider also totally independant sounding and written pitch: The app does not interpret at all.

mogenslundholm commented 5 years ago

Not fond of the Midi pitch numbers. Charles Delusse (1720-1774) wrote "Air a la Grecque" with some quarter notes . The MIDI writers should know that classical European music have 24 notes per octave, though 12 of them are seldom used. Also I think that MNX should be "demiditized". I wonder if the new MIDI standard will remove the limitations on Pitch, Channel and Instrument. (The are more instruments, e.g. I prefer an oud rather than a gunshot).

notator commented 5 years ago

@mogenslundholm

About "An app ... would interpret this as a double-flat". Consider also totally independant sounding and written pitch: The app does not interpret at all.

I'm not sure what you mean there. My proposal does indeed treat written pitch as being completely independent of sounding pitch. But the written and sounding pitches can always be inferred from defaults where the information is otherwise missing. The app always has enough information to do some kind of interpretation.

I think that MNX should be "demiditized".

Agreed. Using "MIDI.cent" syntax does not mean that the interpreter has to use MIDI to implement the sounding output. The syntax just provides a convenient way to describe a frequency. Maybe the name needs changing. "MIDI.cent" notation would work regardless of whether or not MIDI 2.0 provides a simpler way to describe more accurate tunings (maybe by supporting Hertz or cent-accurate tunings directly). §5.2.1.4 of the current draft spec provides a link to Scientific Pitch Notation. At the bottom of that article, there is a table which provides a direct correspondence between "MIDI note numbers" and equal temperament frequencies. So its possible to describe cent-accurate frequencies using MIDI.cent (or SPN.cent) notation. That provides sufficient accuracy, and is much more expressive/convenient, in a music notation context, than using Hertz.

The score of Air a la Grecque seems only to be available on-line through various libraries, but there's a performance on YouTube. It would be up to the interpreting application to decide how to notate it, but the piece's notes (graphics and sound) could very well be described by the syntax I'm proposing. The original notation, in particular, could only be reconstructed by an app that knew how the original notation looked. Did Delusse use special accidentals, provide fingerings, or just write some explanatory text above the notes?

I wonder if the new MIDI standard will remove the limitations on Pitch, Channel and Instrument. (The are more instruments, e.g. I prefer an oud rather than a gunshot).

In spite of "demiditizing" MNX, I think there should be a way to interface with MIDI if that is specifically required. MIDI banks solve the channels problem, but if you want a specialised instrument, such as an oud, I think you are always going to have to provide a soundfont (or something similar) containing the patch defining it. (There are various initiatives around, trying to simplify that...) MNX (not just MNX-Common) needs a way to link to a particular MIDI patch. In MNX-Common at least, that ought to be done in a <direction>. Maybe the patch would be in the MNX container, or in the cloud somewhere...

<midi-patches>
  <midi-patch name="oud" url="some url" />
</midi-patches>
...
<measure index="1">
  <sequence>
    <directions>
      <midi-patch="oud" location="1/2" />
    </directions>
    <event value="/2">
      <note midiPitch="60"/>
    </event>
    <event value="/2">
      <note midiPitch="61"/>
    </event>
  </sequence>
</measure>

The default patch would, as usual be a grand piano.

clnoel commented 5 years ago

I'm going to tweak my previous proposal on the sounding/written pitch debate. I'd appreciate some comments. The underlying idea here is to make it easy to replicate the written document. As much as it violates my natural inclinations that the sound is the thing we need to treat as the ground truth, it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document. It's the same problem that any transcriber ends up with when trying to turn an audio track into a score, and a good portion of the point of this format is to remove ambiguities.

So, if you have a concert-pitch part, it's easy to represent a Middle-c note. I've written it in here with two different spellings, to make it clear.

<global>
   <measure>
       <key fifths='0'/>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
      </directions>
      <sequence>
         <event value='/2'>
            <note pitch='C4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4'/>
         </event>
      </sequence>
   </measure
</part>

If the 'Dbb4' spelling is meant to represent a microtone, instead of the same pitch as a 'C4', which some comments above indicate might be the case, the following can be used:

<global>
   <measure>
       <key fifths='0'/>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
      </directions>
      <sequence>
         <event value='/2'>
            <note pitch='C4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

In this case, the pitch attribute can have arbitrary numbers of sharps and flats plus a microtone adjustment, but the sounding attribute should limit itself to 1 sharp or 1 flat plus a microtone adjustment. The sounding attribute should also ignore the active key signature, and always directly specify the sounding value (so would specify "F#" even if the key was "G" and the pitch-spelling "F"). Also, I'm open to making the sounding pitch be a 'Midi.cent' or similar value instead.

If you have a transposed-pitch part (in this case, Bb), you can have the following:

<global>
   <measure>
      <directions>
         <key fifths='0'/>
      </directions>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
         <transposition semitone='-2'/>
         <key fifths='+2'/>
      </directions>
      <measure>
      <sequence>
         <event value='/2'>
            <note pitch='D4'/>
         </event>
         <event value='/2>
            <note pitch='Ebb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

In this case, the open-ended transposition direction occurs between the clef and the visible key. Just as for the concert-pitch case, the creator of the file does not need to add a sounding attribute unless there is something specific he wishes to specify. If the consumer wishes to calculate the sounding pitch, he applies the alterations specified by the pitch, and then applies the alterations specified by the key, and then applies an additional alteration specified by the currently active transposition directive. As before, if you wish to specify a sounding attribute to a note you ignore any active transposition and key directives and directly specify the sounding pitch.

If you have both a concert-pitch version and a transposed-pitch version that are part of the same document, you have to specify both spellings. You can decide when creating the MNX-Common document to make these entirely separate, or you can allow for two different realizations of the same part in the same document by specifying by specifying both.

<global>
   <realization name="Concert score">
      ... 
      specify parts are in this realization/layout, including using the first pitch spelling
      ...  
   </realization>
   <realization name="Clarinet part">
      ...
     specify that this is only the clarinet part, and that it uses the second pitch spelling
      ...  
   </realization>
   <measure>
      <directions>
         <key fifths='0'/>
      </directions>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G;G'/>
         <transposition semitone='0;-2'/>
         <key fifths='0;+2'/>
      </directions>
      <measure>
      <sequence>
         <event value='/2'>
            <note pitch='C4;D4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4;Ebb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

Note that there is still only one sounding pitch for both spellings. Also, the "concert pitch" spelling does not have to be the first one in the list. It all depends on how your realizations want to use them.

As an aside, I have no objection, and in fact would like, this "alternative spelling" system to be used for other purposes, like specifying the TAB notation right along with the conventional notation...

Note: I presented something like this system way up above, but I think I have addressed several issues and added several refinements to it since then. Does anyone feel like this doesn't address an issue that they have?

Edit: A Bb transposition is -2 semitones, not -1. I've fixed it.

jsawruk commented 5 years ago

@clnoel:

As much as it violates my natural inclinations that the sound is the thing we need to treat as the ground truth, it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document.

Thank you for saying that clearly. It's the cornerstone of my position, but I have not been able to say it in such simple terms!

and in fact would like, this "alternative spelling" system to be used for other purposes, like specifying the TAB notation right along with the conventional notation

Couldn't agree more!

mogenslundholm commented 5 years ago

I converted files from mu2-format to MusicXML. Sound I added small arrows to show the pitch, but was recommended to remove them: The players won't have them. I was told that the player knows the style and knows what pitches are used in this style. Easy to produce the sounding pitches? Couldn't disagree more!

mogenslundholm commented 5 years ago

@notator: I just mean that with both sounding and written pitch a program does not need to "correct" one from the other.

notator commented 5 years ago

@clnoel and @jsawruk

it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document.

and

a good portion of the point of this format is to remove ambiguities

The word "ambiguities" is a bit weak there! I'd say it was actually impossible to reproduce an original, written document from the sound alone. Transcribing sounds requires knowledge of a whole notation tradition, instrumental conventions etc. Things like clefs, accidental spellings and fingerings don't appear at all in the sound. All that information is in the transcriber's mind. On the other hand, it should be possible for a CWMN app to provide a first transcription attempt, that could be tweaked by its user and then saved, possibly together with the original, transcribed sounds. So yes, I agree with you both: The only way to avoid "ambiguities" in a graphic is to save it! :-)

@clnoel

the pitch attribute can have arbitrary numbers of sharps and flats plus a microtone adjustment,

Do you really mean we should be allowed to have pitch="D####4"? You don't actually give an example, so probably not! :-) I'm not sure if I've ever seen a triple-flat or triple-sharp, but I'm sure MNX-Common doesn't need more than three flats or sharps on the same notehead. What is MusicXML's opinion on that? Maybe unlimited numbers of flats or sharps should be allowed in some advanced MNX-Common profile... :-)
You don't give an example of a pitch with a microtone adjustment. Have you got any suggestions for doing that? Maybe we do need to define a syntax for quarter-tone symbols (in addition to the wildcard I described above for the "best fit"). One could, for example, prescribe a quarter-tone flat using qb and a quarter-tone sharp using q# (as in pitch="Dqb3", pitch="Cq#6" etc.). Apps that don't support quarter-tone symbols would simply choose some other symbol, for example the one for the semitone above or below, leaving it to their users to tweak the result in any way the app allows.

Aside: Stockhausen used both accidentals that meant precise (ET) quarter-tones, and accidentals that meant "slightly sharp" or "slightly flat". These accidentals are all in the SmuFl standard.

§5.2.2.4 of the current draft spec says that only U+0023 HASH (=#) and U+0061 b can be used as alterations in the pitch value. I think forced naturals also need to be defined, and that the 'n' character (U+006E n) should be used for that (e.g. pitch="Dn5", pitch="Gn2" etc.). BTW: according to Wikipedia, 'b' is U+0062. That seems to be an error in our draft spec.

Sounding pitch: (@clnoel again)

the sounding attribute should limit itself to 1 sharp or 1 flat plus a microtone adjustment

and

<note pitch='Dbb4' sounding='C4+.25'/>

The name of the sounding attribute is up for discussion. I called it midiPitch above, but that may be a bit confusing since it doesn't necessarily have anything to do with MIDI (see my answer to @mogenslundholm above). Other possible names for the sounding attribute might be sound, frequency, freq etc. Its extremely important to distinguish between written and sounding pitch here, so I'd prefer not to use pitch names (C4 etc.) in the frequency description.

If we used a symbol name in the sounding attribute, there would be two ways to describe some frequencies. There would be no difference between using a Db or C#. That violates the principle of non-duplication.
If we keep symbol names out of the frequency definitions, then its clearer that the pitch attribute (which does use symbol names and accidentals) refers to the written object. So its obvious that the pitch attribute is referring to a graphic.
All notations can use abstract frequency values, but not all notations use CWMN note names. Some notations only write a fingering, the name of a string to be plucked, or a direction in which a melody moves. I'd prefer to have a unified way of describing frequency in all notations, so I'd prefer to avoid using CWMN note names in frequency definitions.
An MNX-Common file will mostly be written and read by software. The sounding attribute is only used by software that is generating (code for) an audible output. In many cases that will involve MIDI, so it would be convenient to have a simple way to convert the sounding attribute's value to one or more MIDI instructions. Using a "MIDI note number" as part of that information is just simpler than going via a CWMN note name.

I think we agree that, however it is defined, the sounding value (if it exists) should always override any transposition, key signature etc.

...

the open-ended transposition direction occurs between the clef and the visible key.

To be really picky, I think the <clef>, <transposition> and <key> directions could be written in any order inside the <directions> element.

If the consumer wishes to calculate the sounding pitch, he applies the alterations specified by the pitch, and then applies the alterations specified by the key, and then applies an additional alteration specified by the currently active transposition directive.

Nearly. An accidental in the pitch attribute should actually override the default for that diatonic pitch stipulated by the <key> directive. Here's some pseudo code (comments and corrections welcome!):

Let there be a table (Table A) in which the default (ET) frequencies of the unaltered seven diatonic
pitches are going to be stored.
Table A takes no account of <key> or <transposition> directions.

Let there be a table (Table B) that will contain the running state of the default (ET) frequencies
of the seven diatonic pitches (notated on a staff that may have a <key>).
Table B will take both <key> and <transposition> directions into account.

if global tuning info exists
{
  Use the global tuning info (e.g. A4="431") to populate Table A.
}
else
{
  Use A4="440" to populate Table A.
}

For each <note> 
{
  Use Table A and the <key> and <transposition> states to update Table B.
  If the <note> has a "sounding" attribute
  {
    the <note>'s frequency is given by the "sounding" attribute. The "pitch" attribute is ignored.
  }
  else // the <note> must have a "pitch" attribute if it has no "sounding" attribute
  {
    Find the diatonic pitch name, accidental and octave in the <note>'s "pitch" attribute.
    If the "pitch" attribute contains an accidental
    {
      the <note>'s (ET) frequency is found using Table A, the diatonic pitch name, the
      accidental and the octave value. 
    }
    else
    {
      the <note>'s (ET) frequency is found using Table B, the diatonic pitch name and the
      octave value.
    }
  }
}

If the notation in @mogenslundholm's posting above is to be classed as MNX-Common, then the above algorithm has to allow for arbitrary (frequency) modes that use the seven diatonic pitch levels on a CWMN staff. Maybe there should be a <mode> direction that allowed the base frequencies of the seven diatonic pitch names to be defined? For example, E4 and B4 could be "detuned" to be slightly above ET Eb and Bb as follows (I don't know what the precise values should be here):

<mode C4="60" D4="62" E4="63.3" F4="65" G4="67" A4="69" B4="70.3" />

Something similar could also be done as an extension of the <key> direction. Any other ideas on how to create special modes?

...

I have no fundamental objection to superimposing score and part definitions as in the final example in @clnoel's last posting. Its probably a good idea, that helps file maintenance when editing. Better not to have to edit similar things in different places...

@clnoel: Could you provide a simple example of how you imagine TAB being defined in MNX-Common? I'm new to MusicXML, and there's nothing about TAB in the current draft spec. I'm especially interested to see if there are implications for other non-CWMN-staff notations. Thanks.

clnoel commented 5 years ago

@notator In no particular order:

1) About TAB, #63 is the right issue to discuss TAB representation, and I'll make a proposal there when I can get into it a little more, although I do want it to be a string-attribute, not a set of elements. I'm not a TAB expert, so I don't know all the edge cases. But I do know that notation+TAB pieces are an important percentage of Musicnotes' imports and exports, so I need to be able to deal with it in MNX. I'd like it to be easy!

2) About the sounding attribute. I'm open to changing the name, and using a number-value instead of a pitch-string. I think we should stay away from the term "MIDI" though, since that seems to be a hot-button for others. We'll have to think about that, but if we can come to a general agreement about written-and-sounding, we can work out those details as another issue.

3) About the values in the pitch attribute, I was following the logical conclusion from the parsing instructions in §5.2.2.4 of the current draft spec. It says, in effect, that while the next character is # (or b), keep increasing (or decreasing) the alteration. Which does mean that D#####4 is allowed! If we want to change that, it should be a separate issue.

4) About adding an 'n' to represent a natural to the pitch-syntax. I just double-checked the current spec for pitch, ant it seems to say that you specify (e.g.) pitch='F#' regardless of what key you are in (C-major or G-major, for example), the difference being that in C-major you would add the accidental attribute to represent the visual display of the accidental, and in G-major you wouldn't, unless there was a preceding F-natural. I am changing that in my proposal for pitch and sounding, as it is not, I think, how most people in this thread seem to be treating pitch-spellings. I do still think we need the accidental attribute for some cases to specify suggested SmuFL characters for the pitch-spellings, or to specify an accidental with parentheses.

5) You state in your pseudo-code:

// the <note> must have a "pitch" attribute if it has no "sounding" attribute

No! This is not an either-or situation (where you can have sounding or pitch or both). The pitch attribute is the required one. It specifies the pitch-spelling from the original document. The sounding attribute is optional!

6) You are correct that the accidentals on a note override the key signature accidentals. I messed that up in my proposal. The worded description on how to get the sounding pitch becomes:

If the consumer wishes to calculate the sounding pitch, he first checks to see if there is a sounding attribute, and uses it if there is. If not, he gets the diatonic step and octave from the pitch attribute, applies the alterations specified by pitch (or, if there are none, the alterations specified by the key signature), and then applies an additional alteration specified by the currently active transposition directive, and then applies any microtonal adjustment specified by pitch.

Importantly, I just realized this doesn't cover "retained accidentals" (notes that are altered by the accidental on a preceding note). Do we just count those into the sounding pitch algorithm, or do we specify them in some way?

@mogenslundholm Well, I am unfamiliar with that style, and would probably appreciate having the arrows! But the question isn't "Is it easy to reproduce the sound from the written pitch?" but rather "Is it easier to produce the sound from the written pitch?"

Given that sheet music is designed to be a set of instructions for producing sound from writing, I still have to feel that decoding the sound from the written pitch is easier.

--Christina

mogenslundholm commented 5 years ago

@clnoel: You wrote "I'd appreciate some comments". It really looks good. But I still think that non-mandatory sounding-pitch = not there.
Playing when having sounding pitch is just playing it. Not having sounding pitch is asking: "Is the sounding pitch there? No, then: Is written pitch there? Is it transposed? it it a ottava-8 up? ottava-8 down? ottava-15-up? down ottava-22 up? down? Is it a harmonic? Which type? natural? artificial? base-pitch, touching-pitch or sounding-pitch? (MusicXML possibilities)". PS: <note pitch="C4,C4">... is short. And so is "C4,C4;D4". Someone who can't figure out what this should mean? (answer: Sounding,Written;Written ....)

@notator: You wrote: Do you really mean we should be allowed to have pitch="D####4". Well - I do. It is just easier to have no limit. Will this occur? No. (But I can make a silly example, if you want). Do you have the mentioned work of Stockhausen as MusicXML?

I have started making a transposing algorithm. Current version transposes to flat-keys and back. With a little luck it will work also when I add sharp-keys. I will prove it by transposing all notes of a key to any other key and from that to any other key, and show the result is the same as transposing in one step.

adrianholovaty commented 5 years ago

After a lengthy discussion at Musikmesse 2018 in Frankfurt, we've decided on storing written pitch. (Meeting minutes and video here, with discussion starting around 1:05:20 in the video.)

I've added a pull request to issue #4, which is specifically about the sounding-vs-written pitch issue.

As for this particular issue (Realizations and Layouts), the discussion here has become quite wide-ranging — to the point where it's difficult to digest all the ideas and they've veered off track in places. :-) With that in mind, I'm going to copy the relevant ideas into issues #34 (differences between scores/parts) and #57 (page/system flow), for more atomic discussions. After that's done, I'll close this issue (but of course it'll be archived and all comments will still be available).

shoogle commented 5 years ago

@adrianholovaty, thanks for the update! I'm glad a decision has been made. Written pitch is better for transcribing existing works, and will certainly make the transition from MusicXML much easier.

Now, I would have like to use sounding pitch for new compositions, and I think that would have made more sense for a native format. However, the "workaround" for those of us that prefer sounding pitch is simply to write your scores in concert pitch, because, as @mdgood says in the video:

in concert pitch, written pitch is sounding pitch.

So if you truely don't care about transposition, or don't feel qualified to specify a spelling, you can simply write your scores in concert pitch and leave the transposition to an editor / end users / the application.

notator commented 5 years ago

@adrianholovaty: Great that a decision has been made! (Sorry about the delay, but I've been away...) This thread continues in #4 but to tie things up here, and for the record, I'd like to reply properly to @clnoel's and @mogenslundholm's last comments.

@clnoel: Thanks for the link to #63. I'll take an independent look. Interesting that @snakebyte69 is calling for opinions/participation from the main software vendors! As I said somewhere, they also have a special role to play in deciding what is and is not CWMN... :-)

The sounding attribute: Yes, I'm also open to discussing different names, but sounding is fine by me. You're probably right that we should stay away from using "midi" in the name (But I still think that "MIDI.cent" notation would be a convenient way to notate frequencies. Much more convenient, for example, than using Hertz.) Note that the current spec quotes this Wikipedia article on "Scientific Pitch Notation" to justify the use of octave numbers in pitch attributes. The same article uses MIDI note numbers lower down.

D#####4 and adding n to the pitch syntax: Yes. These and the representation of accidentals in general, need discussing properly. I think the best place for that is currently #4. Basically, I think the current spec is confusing and out of date following our adoption of the transposition direction and sounding attributes.

Apropos the pitch and sounding attributes, you said:

This is not an either-or situation (where you can have sounding or pitch or both). The pitch attribute is the required one. It specifies the pitch-spelling from the original document. The sounding attribute is optional!".

Okay. I agree with you. The pitch attribute, which specifies the graphical pitch-spelling, should be compulsory. The point of my pseudo code was to get at issues like that, so as to clear them up. :-) Reading that code, I also thought about "repeating accidentals". I think these can simply be included in the cascading defaults hierarchy that is read while parsing the file. We'll probably solve that issue in #4.

@mogenslundholm The Stockhausen accidentals can be found in the SMuFL docs here. As I said, I think we should continue the discussion about how MNX-Common should treat accidentals in #4.

Previous Next