w3c / mnx

Music Notation CG next-generation music markup proposal.
176 stars 19 forks source link

Are notes encoded in sounding or written pitch? #4

Closed joeberkovitz closed 5 years ago

joeberkovitz commented 7 years ago

In MusicXML, all pitches are framed in terms of the instrument's transposed pitch. In MNX, we can re-examine this question.

Encoding in transposed pitches makes performance interpretation into more work since the instrument's transposition must be applied, and this transposition could even vary during the course of a piece.

It also makes dynamic display of concert pitch trickier, since assumptions must be made about enharmonic spelling and key signatures.

On the other hand, encoding transposed pitches will more accurately reflect a manuscript that has been composed in transposed form, and may ultimately be more robust in terms of delivering a definitive final instrumental part.

Other pros and cons need to be brought out also.

mogenslundholm commented 7 years ago

I feel I am not sure about the meaning of the words: Let say we have a pitch, that we measure by an oscilloscope or tuner when the note is played on the instrument, the real pitch - is this "in concert".? Anyway, I think the real pitch value should appear in MNX. As I remember it - MusicXML is different, but there is also the symbol "Ottava" or "8va" and in this case MusicXML uses the real pitch values. Also different Clef-symbols will have same pitch-values, but the notes will look different on the paper.

siennamw commented 7 years ago

Perhaps this topic should be renamed, "Are pitches encoded in sounding or written pitch?" for clarity.

I can see advantages and disadvantages with both approaches. Perhaps both should be supported. Each part could be flagged as being encoded with either sounding or written pitches, with <part pitch-encoding='sounding'> or something similar.

joeberkovitz commented 7 years ago

I've renamed the topic for greater clarity. Noted that octave displacements play a role here as well.

@mdgood, can you chime in at some point with an explanation of the design rationales for MusicXML's pitch encoding so we have your background on this?

mogenslundholm commented 7 years ago

Two ways to write transposed pitch? No, I would not like that. This would double testing and increase risk of errors, because properly one of them will be used seldom and therefore not tested in practice. With only one way to do it, this way will be tested every time. (For me the sound is the real pitch - but I can add an offset like with MusicXML)

siennamw commented 7 years ago

I'm still not sure that we're talking about the same thing here yet @mogenslundholm. The question is, how do we encode music for transposing instruments? When we write for a clarinet in B-flat, for example, if we want to hear a B-flat are we encoding B-flat (sounding pitch) or C (written pitch)?

mogenslundholm commented 7 years ago

I would prefer encoding sounding pitch (here B-flat) to be in MNX: <note pitch="Bb4"/>. A command to transpose could be something similar to the MusicXML command: <transpose><diatonic>1</diatonic><chromatic>2</chromatic></transpose> This would inform the notation program to show a C(written pitch). I am looking at the Lilypond MusicXML test example "72a-TransposingInstruments.xml". But I would definitely not like to have a choice - better a decision of what to use, sounding or written.

mdgood commented 7 years ago

One of the design principles of MusicXML is to primarily represent a written piece of music in a semantic way. This has two important advantages:

MusicXML thus represents written pitch, with a transpose element to convert transposing instruments into sounding pitch. Written pitch in this case is not necessarily the same as the position on a staff. A piece of music that looks like a middle C with an 8va line over it will be represented in octave 5, not octave 4. The octave-shift element represents the 8va line.

I think that design decision has worked very well and makes life much easier for anyone comparing a MusicXML encoding to a printed piece of music.

MusicXML's representation does have some issues in concert scores. Given MNX's desire to be able to make better use of a single file for score and parts, it would be good to have transposition information available in a concert score. Octave transpositions that are still used in concert scores are another issue, as discussed in MusicXML issue 39. I think we can resolve these issues in MNX while still retaining the great usability benefit of having MNX directly and semantically represent what the musician sees and understands.

I do agree with @mogenslundholm that either choice is far, far preferable to supporting both.

webern commented 7 years ago

I agree with Michael Good. My preference would be for MNX to represent the pitch as it appears on the page of music, which is the written pitch in the case of a transposed score and transposing instrument.

MNX should make these facts clear (that it is a transposing score and that the current location of the current staff/instrument is written in transposed pitch and that the transposition about is X).

joeberkovitz commented 7 years ago

From chair call: perhaps we should explore how we can represent alternative spellings of keys and notes for the same part, so that this choice of transposed vs concert pitch is less weighty of a decision? (Or at least informs that choice.) This is now captured as #34

joeberkovitz commented 6 years ago

From the chairs: along with #34 we want to move this issue into focus so that we resolve this important aspect of MNX-Common.

clnoel commented 6 years ago

Just to clarify the terminology and issue here, so I don't make a fool of myself.

Is the following case what we are talking about?

I have the following measure: image

If I was notating this for a trumpet (a Bb instrument), I would instead notate like this: image

We are trying to figure out whether we want to encode the trumpet part as it looks or how it sounds. Looks: starts with a C5 event, and with a notation somewhere in the definition of the part that everything needs to be transposed down a whole-step when interpreting it for audio generation. Sounds: starts with a Bb4 event, and with a change to the clef element to indicate that you put a Bb4 on the third space on the staff (with no flat).

Does that summarize the issue?

Edited for joeberkovitz's nitpick. Of course a Bb is a whole-step down from a C, not a half-step. Sigh.

joeberkovitz commented 6 years ago

@clnoel That's an excellent summary. (Nit: I think you meant whole-step, not half-step).

cecilios commented 6 years ago

I can see advantages and disadvantages with both approaches, as it has been previously commented by others. But I think it is better to notate music in written pitch.

First, from a formal point of view, I consider written pitch more in line with the objective of capturing the semantics, as for me this implies to capture how the music is written using paper and pen. In my opinion the meaning of 'semantic' should not be changed in some particular cases, such as for the sound of transposing instruments. An analogy: imagine a text in English with some paragraphs in French. Using sounded pitch would be as encoding the English paragraphs using characters and the French paragraphs using phonetic symbols instead of characters.

And second, from a pragmatical point of view, dealing with transposing instruments will always require to transpose provided pitch, and it does not matter if the score is written in sounded pitch or in written pitch. This is because if the user changes the instrument for playback (i,e from trumpet in Bb to flute in concert pitch or to alto sax in Eb), the application will always have to re-compute pitches. So there is no gain encoding in sounded pitch, only in marginal cases in which changing instruments is not allowed. And using sounded pitch, will force to do more computations for properly displaying the score. On the contrary, using written pitch simplifies score display. And does not impose any penalty to playback, as playback will always have to deal with transposing instruments, as commented above.

Databases for music analysis or music search perhaps would benefit from using sounded pitch, but it is up to those applications to store music in the format more appropriate for the processing they will do.

mdgood commented 6 years ago

Just to clarify @clnoel's summary: no matter which approach we use, the transposition data needs to include both the number of diatonic steps and the number of chromatic steps. MusicXML represents these directly but there are other possible representations that are equivalent. The important thing is that the number of chromatic steps by itself is not sufficient for notation applications. The Bb transposition for instance is one diatonic step and two chromatic steps.

joeberkovitz commented 6 years ago

I've been moving towards a unified approach to this and a variety of other issues; please let's carry the discussion to #138 (at least for now) and see if that proposal will work! Thanks.

adrianholovaty commented 5 years ago

After a lengthy discussion at Musikmesse 2018 in Frankfurt, we've decided on storing written pitch. (Meeting minutes and video here, with discussion starting around 1:05:20 in the video.)

I'll put a pull request together momentarily with small changes to the spec to clarify this. Note, though, that there's still ambiguity — specifically about how 8va/8vb markings are taken into account.

shoogle commented 5 years ago

It's good that a decision has been reached. Written pitch certainly has its advantages, but so too does sounding pitch, particularly for new compositions. The group may have decided on written pitch, but as I noted in #138, there is a workaround for people who prefer sounding pitch: simply write your scores in concert pitch where written pitch and sounding pitch are the same.

shoogle commented 5 years ago

As for the matter of ottava lines, I think that notes under these markings should definitely be notated at sounding pitch. This is because:

As such, I would consider ottava lines to be "presentational" rather than "semantic". Furthermore, the fact that ottava lines can differ between editions opens up the possibility that somebody may wish to store multiple editions in a single file (i.e. multiple layouts):

<ottava-start id="1" octaves="+1" editions="smith1837,brown1972"/>
<note pitch="C6"/>
<ottava-start id="2" octaves="+1" editions="jones1954"/>
<note pitch="D6"/>
<ottava-stop id="1"/>
<ottava-stop id="2"/>

Notice how the ottava starts at different places in different editions, yet the notes only had to be specified once. This only possible if the notes under the ottavas are stored at sounding pitch, which is the same in all editions, rather than at written pitch, which can vary between editions.

Now applications can give the user the option to switch between layouts of different historic editions, or to define a new layout of their own. Ottavas from other editions would be ignored for the purposes of rendering the score, but would be maintained on saving and loading to ensure all information is preserved.

Encoding multiple layouts may not be something we want to include in version 1 of MNX, but it would be wrong to preclude it as a possibility if there are no other reasons to prefer sounding or written pitch under ottava lines.

shoogle commented 5 years ago

As always in music notation, there are some special cases:

  1. Ottavas that apply to notes in one voice but not the other.
  2. Ottavas on a shared line that apply to some instruments and not others.
  3. Optional ottava markings.

Regarding (1), I've never seen an example of this, but I have seen clefs that apply to one voice only, so it probably occurs somewhere. The problem of deciding which notes are affected by the ottava is the same regardless of whether the notes are stored at written or sounding pitch, so (1) is not relevant to this discussion.

Re. (2), I seem to remember seeing this on some vocal scores when men and women are written on the same staff. Sometimes the phrase "men sing 8vb" might be given as an indication that, while the lines are written in unison, men and women are supposed to sing in their respective octaves. However, most choirs would interpret the line this way anyway, even if the instruction was missing or it said "unison"!

For singers, the marking "men sing 8vb" is arguably non-semantic, but if it was for instruments (e.g. "flute plays as written, oboe 8vb") then it would be more important. The lines share written rather than sounding pitch, so it is written pitch that must be recorded if you want to avoid writing the lines twice. However, I can't recall ever seeing an example like that, and if any exist then they probably belong to a wider discussion about how to handle parts and shared staves. (Perhaps this kind of split should be a property of the clef rather than an ottava line.)

Re. (3), this also occurs most often in vocal scores. You sometimes see something along the lines of:

In these situations the performer (and therefore the playback algorithm) has a decision to make, but there is usually a preferred option that is either specified by the composer or has become established by convention. Whether the notes should be stored at sounding or at written pitch depends on whether the convention is to play the ottava or to ignore it.

notator commented 5 years ago

Since this posting would otherwise be far too long, I am splitting it into 3 parts:

  1. This summary of what what @clnoel and I are currently thinking
  2. A discussion about <directions> (including the discussion about 8vas that has begun above)
  3. A discussion about the pitch attribute and accidentals (which also belongs to the discussion around MNX-Common storing written information).

I'm providing the following summary because this thread is a continuation of #138, that thread is very long, and the current draft spec no longer reflects the current state of the debate. If it proves impossible to keep the current draft spec up to date (its very difficult to edit), perhaps I should keep updates of this summary in an up-to-date pull-request? Has anyone got a better suggestion for how to keep track of these discussions?

Summary of what @clnoel and I are currently thinking (25.04.2019)

(@clnoel - please correct me if you disagree with any of this.)

A <note>'s pitch attribute is compulsory, and very strictly contains only graphical information. In other words, it describes what is written, i.e. how the symbol looks in the printed score. (This formulation complies with the co-chair's decision as described above, but that decision still needs to be clarified.)

<note> is going to have a separate (optional) sounding attribute that, if present, determines its frequency. The frequency information can't be located in the (graphic) pitch information since the <note>'s frequency usually depends on external state information (the parser's current key, 8va, transposition, measure states etc.). The <note>'s sounding attribute, if it exists, overrides the frequency calculated using the graphic information in the pitch attribute and the current total "transposition state" maintained by the parser. This simplifies things considerably, since the pitch attribute no longer has to contain both graphical and temporal information (as in §5.2.2.4 of the current spec and in #19's opening comment).

<transposition> is a new, optional <direction> element whose values are limited to whole semitones. The current <transposition> state will be taken into account by the parser when calculating a frequency from a <note>'s pitch graphics.

notator commented 5 years ago
<direction>s

@clnoel points out that #111 is where we should probably be discussing how <direction> contents such as transposition, 8va, key-changes, clef-changes etc. should work in general. However: My current feeling is that <transposition> should not be the only <direction> element that can change the default frequency of a <note>'s pitch. This is mainly because the <transposition> element cannot cope with key-changes (that change the frequencies of individual diatonic pitches), so the parser is going to have to look at those separately. So I'm proposing that

Note that all the <direction> elements can, if necessary, have special attributes that change their default behaviours. In particular, I think that all the special 8va cases mentioned by @shoogle could be accommodated using appropriate attributes in the <octava-begin> element (in the way he describes above). The precise details could/should probably be discussed in a separate, dedicated issue so that this one does not get too long. @shoogle said above:

notes under [ottava lines] should definitely be notated at sounding pitch

I can only say that I disagree. This contradicts to the co-chair's decision, and introduces unnecessary complexity. It would then become very difficult to find out what should be written in the score. This should be discussed, if necessary, in the separate issue dedicated to @shoogle's 8va proposals.

In #138 @clnoel said:

If the consumer wishes to calculate the sounding pitch, he first checks to see if there is a sounding attribute, and uses it if there is. If not, he gets the diatonic step and octave from the pitch attribute, applies the alterations specified by pitch (or, if there are none, the alterations specified by the key signature), and then applies an additional alteration specified by the currently active transposition directive, and then applies any microtonal adjustment specified by pitch.

I'd like to rephrase that, especially the final clause ( pitch does not specify a microtonal adjustment):

The frequency of a <note> is usually calculated from its graphic information (pitch) in combination with a "transposition state" that is maintained by the parser while reading the file. The parser reads the pitch's notehead height (C4 etc.) and accidental, and uses these in combination with the <note>'s current "transposition state" that is maintained by reading the values in various <direction> elements. The parser always maintains the current "transposition state" for each part, regardless of any sounding attribute that may exist in the <note>. If there is a sounding attribute in the <note>, its (absolute frequency) value overrides the current "transposition state" for that <note>.

Apart from the sounding attribute, this is very similar to the way humans parse printed scores. So it should be intuitively comprehensible. Humans use temporal context and memorised "performance practice" to determine precise, microtonal frequencies. Think of the precise tunings used by string quartets in live performances...

notator commented 5 years ago

I'd first like to say that I think the current spec is so far out of step with what we are currently thinking, that (providing we keep track of what we are thinking) it would be best to ignore it rather than fighting it. This is especially true in the area of accidentals. I want to get on with productive discussion rather than patching bugs (like D####4) that simply don't exist in the new world. A major re-write of the proposed spec can wait until we have a more complete document from which to work. That having been said, the current draft spec remains an indispensable guide around the edges of what we are trying to do. It just shouldn't be allowed to get in the way of a new, clean approach.

Accidentals

The <note> pitch attribute is badly named since it does not describe a frequency so, for the purposes of the following discussion, I'd first like to consider replacing it by a head attribute that describes the height of the notehead. This head attribute simply takes diatonic values ("C4", "E6" etc) without accidentals. If a written accidental is required, then it has to be supplied in the accidental attribute. (Some simple accidentals could be included as shortcuts in the head string later, but lets leave them out for the moment.) Some simple <note> examples would then be:

<note head="C3" />
<note head="G6" accidental="#" />

(The frequencies that correspond to these notes depend on their current "transposition state" as described in the previous two postings. Similarly, the vertical position of the notehead on the staff will depend on the current "clef state".)

The following, simple accidental values ought to be defined:

(These are the accidental strings that could eventually be included in the head value string.)

It should also be possible to use arbitrary accidentals from the SMuFL standard, but I have been unable to find out how the current draft spec suggests that this should be done. Possibly a SMuFL glyph code number would suffice. For example:

<note head="D5" accidental="U+ED5F" />

There are two problems with that: What to do if the SMuFL glyph is not available, and how the glyph affects the sound. It should, in such a case, be compulsory to define one of the above simple accidentals or "none" as the fallback. (Maybe "-" could be used instead of "none" here -- to be decided.) Perhaps like this:

<note head="D5" accidental="U+ED5F;b" />

or

<note head="D5" accidental="U+ED5F;none" />

or

<note head="D5" accidental="U+ED5F;-" /> // same as "none"

etc.

If the sounding attribute were defined, it would be used whatever glyph is displayed. If not, the sounding pitch would be calculated from the displayed glyph, whichever one it is.

shoogle commented 5 years ago

@notator, I believe all of what you have proposed above has either been settled already, or is not relevant to the current topic of discussion. The present issue is purely about whether to use written or sounding pitch. The implementation details of how to represent pitch in the XML syntax can be left to a future discussion (i.e. in another issue), as can tuning and transposition.

As @adrianholovaty said, the debate in this issue has been mostly settled; the only thing that remains is to sort out a few ambiguities around ottava markings.

@shoogle: notes under [ottava lines] should definitely be notated at sounding pitch

@notator: I can only say that I disagree. This contradicts to the co-chair's decision, and introduces unnecessary complexity.

You are confusing octave shifts with transposition. It's an easy mistake to make because we are confusingly using the terms "written pitch" and "sounding pitch" to mean different things in different contexts. Better terms would be:

As @adrianholovaty said, the co-chair's decision to use transposed spelling has no bearing on whether we should use written or sounding octave for notes under ottava lines. This is a decision that is still to be made, and in my previous posts I pointed out the strengths and weaknesses of each method.

notator commented 5 years ago

@shoogle Thanks for the above posting. I'm still puzzled by what you are saying, but would like to get us onto the same page as soon as possible. You say:

The implementation details of how to represent pitch in the XML syntax can be left to a future discussion (i.e. in another issue), as can tuning and transposition.

This issue's title is asking how to encode notes. In other words, how to describe the implementation details of a <note> in the XML syntax. I therefore don't understand why you think that my proposals belong in a different issue. These proposals describe the implementation details of how to encode both the written and sounding pitch in a <note>. They answer the issue's question by saying that its not a question of deciding on either sounding or written pitch. Both sounding and written pitch can and should be encoded.

So I'm not understanding what you think this issue is about. You say:

The present issue is purely about whether to use written or sounding pitch.

To which "written or sounding pitch" are you referring?

My proposals are a unified strategy for dealing with key signatures, accidentals, 8va lines and transposing instruments, but I think that the details for each of these should be finally worked out in separate sub-issues.

In the first of your above postings you mention the use-case of having different 8vas in different editions. I think we should consider all use-cases at this stage, so as to ensure the standards we are developing are extensible. This use-case is complicated, but I don't think it is precluded by adopting my proposals. (Actually, I think this particular use-case should probably be dealt with in the context of "different editions" generally -- and that its a question for a different issue.)

The special cases in the second of your above postings:

  1. Ottavas that apply to notes in one voice but not the other.
  2. Ottavas on a shared line that apply to some instruments and not others.
  3. Optional ottava markings.

1 and 2 can be dealt with using special attributes in <ottava-start> elements (inside <direction> elements). 3 is more complicated, but there is, in principle, no reason why it should not be realisable. There again, we need a different issue for working out the details.

adrianholovaty commented 5 years ago

Closing this as finished/superceded by another issue, now that the change from cf46aa48a2f1ce21df5f4f4cac4d37b6f3757ea5 is officially in the spec.

Note that there's one remaining question, about ottavas, and I've created a separate, tightly focused pull request with a proposal: https://github.com/w3c/mnx/pull/152 Please give some feedback there!

adrianholovaty commented 5 years ago

Reopening this until #152 is closed. @mdgood made the good point that we should keep the issues open until they're done, because we're using issues (rather than pull requests) as our primary record and to-do list. Sorry for the spam. :)

clnoel commented 5 years ago

There is more than one remaining question, at least in my mind. I want to clarify it here.

Here is the sample note that I am working off of: image

There are two ambiguities here. A) How do we represent the octave for that note (with or without octava) B) How do we represent the accidental for that note (with or without the key signature's flat)

Given that there are generally two parser types (audio and graphical), here are the options for the spelling of the written pitch: 1: "E5" (audio parser must know about key signature and octava, but graphical parser is exact.) 2: "E4" (audio parser must know about key signature, and a graphical parser must know about the octava to properly place the notehead.) 3: "Eb5" (audio parser must know about octava, and a graphical parser must know about key-signature to omit the b (or we have an empty "accidental" property, and the graphical parser uses that). 4: "Eb4" (audio parser is exact, and a graphical parser must know about both key-signature and the octava (or, as 3, uses an empty "accidental" property).

Since we are choosing to represent written-pitch, at least partially because of the difficulties of correctly spelling pitches, I feel we should be leaning toward option 1, where we represent the graphical look of the pitch. Because if we start changing the pitch-spelling from what you see in the original graphics, where do we draw the line?

However, as I look through the philosophical ideals of MNX, I see that we want to represent the note not necessarily as just a graphical object, but also as a concept. From this standpoint, I can understand if we don't want to go that route. I wanted to present the options and their effect, to highlight what I see as the remaining issues.

--Christina

notator commented 5 years ago

@clnoel Welcome back! I like the audio parser / graphics parser terminology very much! Maybe we should just forget about the philosophy, and get on with the practical details! :-)

Yes, I also want

1: "E5" (audio parser must know about key signature and octava, but graphical parser is exact.)

The audio parser also has to know if this is a transposing instrument. Adding "in F" to your example should mean that the audio ends up another 7 semitones lower. Others note: In this case there is no ambiguity about the pitch -- as would be the case if we were trying to transpose the graphics.

Here are some thoughts about defining the meaning of (non-standard) accidentals. These illustrate, I think, a further advantage of adopting the proposal that is being supported by @clnoel and myself, rather than continuing to use MusicXML's approach (as described in the current draft spec, and proposed by the co-chair in PR #152) In the above posting, I simply replaced the pitch attribute is by a head attribute whose value (in combination with the current clef state) defines the vertical position of the notehead on the staff. The optional accidental attribute is separate, and can define a non-standard accidental. There is also an optional sounding attribute, that overrides the <direction> attributes being read by the audio parser.

As in @clnoel's example, these attributes can be used in combination with the current total transposition state (defined in the <direction> elements) to calculate the <note>'s sounding pitch. Apart from the <direction> elements, the above example would simply be

<note head="E5" />

The audio parser would lower the pitch by 1 semitone for the Eb in the key signature and by one octave for the 8vab.

If a forced natural was needed on the note, it would be encoded like this:

<note head="E5" accidental="n" />

Such an accidental would override the key-signature.

<note> also has an optional sounding attribute, whose value is a (midi.cent) frequency that overrides the frequency otherwise calculated by the audio parser.

<note head="E5" sounding="76.2" />      <!-- 76 is the midi code for E5 -->

Is a note that looks like @clnoel's example (with key signature and 8vab) but sounds like a rather sharp E5 (overriding the key-signature and 8vab).

Since the accidental attribute is separate, and parsed by the audio parser, it becomes possible to define the contribution of each accidental used in the file to the total transposition state. The default "cent-offset" values to be added to the current transposition state for each of the standard accidentals would be:

accidental cent-offset
sharp 100
flat -100
natural 0
double-sharp 200
double-flat -200

It would be straightforward to redefine these values and/or define values for other, non-standard accidentals used in the file, in a table somewhere earlier in the file. For example, some of the Stockhausen accidentals could be defined like this

accidental name unicode cent-offset
raised U+ED50 25
flatRaised U+ED52 -75
threeQuartersSharp U+ED5A 150

The accidentals in the key signature in @mogenslundholm's example could also be coded in this way. Notes that have unique tunings can still be defined using a sounding attribute.

mdgood commented 5 years ago

@clnoel, the spec states that the pitch attribute is "the musical pitch of the note". This corresponds to choice 4, Eb4. This is similar to how MusicXML works, which has been successful. Some of your other choices are reminiscent of how NIFF worked. I found that most NIFF parsers got pitch wrong due to thinking of pitch graphically rather than semantically.

I think there's room for discussion of a separate issue about the exact details of the accidental attribute. The current draft spec carries forward one of MusicXML's more error-prone design decisions, so it would be great to fix this in MNX-Common. I have started a new issue #153 for this.

@notator, this issue is only about written vs sounding pitch, not for alternate syntaxes for pitch representation. The basics of pitch representation syntax have already been decided. There are separate issues for the accidental attribute (#153) and using cents instead of decimal fractions of a semitone (#19).

notator commented 5 years ago

@mdgood, As I think you said somewhere, we are currently using issues to record ideas. This seemed the best place to record an idea that follows from the proposal I'm currently backing, and which I think shows that it is superior to MusicXML's approach.

The basics of pitch representation syntax have already been decided.

I'm not so sure about that. We haven't decided PR #152 yet, so we don't know if §5.2.6.1 The <note> element is going to stay the way it is.

Thanks for opening #153, and mentioning #19. Maybe we should agree about PR #152 before continuing with those, otherwise we will be talking at cross-purposes.

dspreadbury commented 5 years ago

I think we can say with some degree of certainty that the element will stay largely as it is, notwithstanding the issue of how the sounding pitch should be encoded, because the co-chairs are not convinced of the benefits of any of the alternative approaches that have been proposed.

I can't be persuaded to agree with either of @clnoel's first two proposals. I could be persuaded either way about whether the note in her example should be Eb5 or Eb4. In Dorico, for example, the note's pitch is not altered by the octave line/instruction: only its staff position. I believe that octave lines, like clefs, are presentational: they do not change the pitch of the note, only where on the staff it is displayed. For that reason I would be inclined not to include the effect of the octave line/instruction on the pitch of the note, but, as I say, I could be persuaded either way.

notator commented 5 years ago

@dspreadbury (Please don't get upset by my use of "head" rather than "pitch" in the following examples. We are discussing two proposals, and are still working on PR #152.)

As I said above,

(Some simple accidentals could be included as shortcuts in the head string later, but lets leave them out for the moment.)

So I wouldn't mind using

<note head="Eb5" />

to mean the same thing as

<note head="E5" accidental="b" />

Its just that I think MNX-Common also needs a consistent mechanism for dealing with non-standard accidentals. (#153 needs to be addressed later.)

In Dorico, for example...

Just a remark: I think the code in Dorico's implementation of CWMN is important in deciding what CWMN is, and that there must be an unambiguous way for Dorico to import/export MNX-Common files. But that does not mean that MNX-Common has to mirror Dorico's internal code structures exactly. Different applications will use different approaches to implementing CWMN, so they can't all use exactly the same approaches as MNX-Common.

I would be inclined not to include the effect of the octave line/instruction on the pitch of the note

Am I right in thinking that you are inclining to agree with me that Eb5 is correct? Nice that even if not, you might be persuaded otherwise. :-)

All the best, James

dspreadbury commented 5 years ago

Just a remark: stating "In Dorico, for example" does not imply that I believe MNX-Common should mirror Dorico's approach at all. I believe your command of the English language is more than good enough to understand the idiom "for example", so please don't put words into my mouth.

adrianholovaty commented 5 years ago

@clnoel Thanks for spelling out those four options — it's quite useful for focusing the discussion!

I vote for the fourth option (Eb4), as it strikes the right balance between semantics and presentation. This is what my pull request in #152 is intended to communicate.

I agree with @dspreadbury that ottavas and clefs are more presentational than semantic. This is nicely captured by @shoogle in his comment above pointing out different editions have different ottava decisions for the same underlying music.

A note's accidental is both semantic and presentational: it fundamentally affects the meaning of the pitch itself (e.g., Eb4 is a different pitch than E4), but its graphical display is a presentational decision (e.g., does this note have a accidental visually rendered next to it? See issue #153).

Finally, a subtle note about the meaning of "written pitch." @clnoel said in the comment above that she leans toward option 1 because it reflects the written pitch and we'd decided written pitch is the way forward. I'd like to attempt to clarify our discussion by separating the vague concept of "written pitch" into two concepts:

(These terms were completely made up by me, on the spot, and they're serving only for the purposes of clarifying this discussion thread. :-) )

I believe MNX-Common should use Performer-Centric Written Pitch. The pull request in #152 attempts to codify that, via its definition of the term "written pitch," and I'm very interested to get feedback on whether the definition is clear and unambiguous enough. It defines it in terms of the pitch generated by an concert-pitch instrument playing it, which may or may not be a good approach.

notator commented 5 years ago

@dspreadbury

please don't put words into my mouth

I was just amplifying/agreeing with what you said. It would help if you didn't always, by default, bite my head off.

notator commented 5 years ago

@adrianholovaty

...a subtle note about the meaning of "written pitch." @clnoel said in the comment above that she leans toward option 1 because it reflects the written pitch and we'd decided written pitch is the way forward.

I also misunderstood the co-chair's decision in the same way in the above posting where I said

(This formulation complies with the co-chair's decision as described above, but that decision still needs to be clarified.).

Which only goes to show how important it is to use a precise terminology whose meaning we all agree about.

I'm trying very hard to be constructive here, but I have to say that I think its a mistake to try to encapsulate what the performer may be thinking in a file format that is going to be read by programmers and machines.

As a programmer, I would prefer the local XML (i.e. the <note> definition) to clearly describe what I'm expecting to see in the printed score (graphics). I don't want to have to look through the file to analyse the current graphic state (clefs, 8vas, transposition instructions etc.) in order to know what to write in the <note> definition in order to get a particular result in the printed score. If I write an E5 in the <note> definition, I would like to see an E5 in the printed score, regardless of the clef, 8va signs etc. A performer reads a printed score, not the XML, and the printed score contains all the clefs, 8va signs and transposition information that allow him/her to infer which audio pitch to play.

Hope that helps.

kepper commented 5 years ago

FWIW, and mostly because @bhamblok requested so, here's how MEI deals with notes. It can be used in multiple ways, it's focussed on the visual representation (written pitch). However, sounding pitch is available as well. The situation from above could be encoded in multiple ways, but no matter how it would be encoded, there is no way to misunderstand the encoding:

<note pname="c" oct="5" dur="4"/> This would be a written C5 quarter (no matter how it sounds).

<note pname.ges="c" oct.ges="5" dur.ges="4"/> This would sound like a C5 quarter (no matter how it's written). .ges stands for gestural domain, i.e. sound.

<note pname="c" pname.ges="b" oct="5" oct.ges="4" accid.ges="f" dur="4"/> This would be a written C5, which sounds like a Bb4.

An encoding may decide to not provide one or the other (actually, it could go without any of those, but that's a different story). In that case, it's very often still possible to infer the missing information from key signatures, information about transposing instruments or other places, but that may require more processing than every application would be willing to invest, simply because it would be out of scope.

I'm not trying to advertise anything, I just want to make some other's conclusions available to this thread.

notator commented 5 years ago

Further to my previous posting, I'd like to walk through the XML-programming scenario in a little more detail, to convince others (and myself) that it is sensible, and really works, and that there are no hidden problems. (If anyone can find a problem, I'm all ears.) The MEI approach seems overly complicated by comparison. This may mean some repetition of earlier info, but a recap that isn't necessarily a bad thing... (I'll keep using head rather than pitch as the name of the note attribute, but that is irrelevant here, so don't let it disturb anyone.)


We are writing XML code for an 8-measure score with one staff (one player). There is an ordinary treble clef at the start of measure 1, and a <note> in measure 4, defined as <note head="C4" /> In measure 4, the graphics parser will interpret that to mean image The audio parser ignores the clef, and uses the <note>'s default frequency. The default frequency for C4 is 60 in midi.cent units.

Note 1) that if there is lots of code in measures 1, 2 and 3, the clef definition could be a very long way from the note definition in the XML, and 2) that the clef can be changed at will without changing the frequency calculated by the audio parser. Changing the clef tells the graphics parser to render the C4 notehead in the following ways: image The <note>'s head attribute can be changed to something else (e.g. "A3") independently of the active clef. The graphics parser takes care of the details when creating a printout. The usual situation is that the current clef would be changed (e.g. if there were too many ledgerlines), by someone editing the printed score using a score editor (i.e. not editing the XML directly). But it should be very easy for a programmer debugging an MNX-Common reader or writer, to find the relevant information in the XML.

Note that the graphics parser and audio parser have completely separate responsibilities. The graphics parser generates graphics (in space), the audio parser generates audio (in time). They both use <direction> and <note> information, but in different ways.

Lets now add an <ottava-start> <direction> in measure 2 and an <ottava-end> <direction> attribute in measure 6. The graphics parser will draw the appropriate "8va" text, dotted line and end mark. The audio parser will add 12 to the (midi.cent) frequency it is currently using for all the notes in the 8va scope. The frequency for any C4 in scope (default value midi.cent 60) becomes midi.cent 72. Note that the 8va <direction> can be added (or removed) without looking at, or changing, any of the current <note> definitions, and that any of the <note> definitions can be changed without looking for <directions> that may be a long way away in the XML. Similarly for <transposition-start> and <transposition-end> <direction>s. Such a <direction> has two parts: the graphics (e.g. the string "in F" ) and the audio increment (which would be -7 midi.cents for an instrument playing in F. If such a <direction> were to be added in measure 1, the frequency of the note in measure 4 would become 72 - 7 (=65).

I think it would be extremely complicated, by comparison, to have to change all the <note> definitions when adding an 8va <direction> (as currently proposed by the co-chair). That would not only be more work for the XML-writing software, it would also mean that the graphics parser would have to keep track of the audio parser. Things stay much simpler if their domains are kept completely separate.

lpugin commented 5 years ago

I am not sure I understand what you mean when saying that the MEI approach is overly complicated. In any case, if you are looking at a XML-programming scenario, you will probably not do head="C4", that involves the attribute value to be parsed outside the XML parsor, but rather (taking MEI as example) pname="c" oct="5". It happens that this also have the advantage that, in a case of octava, you can specify the sounding octave with oct.ges="4" while the pitch name remains the same.

notator commented 5 years ago

@lpugin Welcome back! :-)

I am not sure I understand what you mean when saying that the MEI approach is overly complicated.

I meant that distinguishing clearly between the graphics parser and the audio parser makes having separate attributes for the graphics and audio unnecessary. In this proposal, the XML is designed so that it can be parsed in either way. Applications that are not interested in audio can just parse the graphics, and vice-versa. Apps that want to parse both can, of course, do so -- and the two domains will be automatically synchronised.

head="C4" could, of course, be split into pname="c" and oct="4", but I think that would be both unnecessary and confusing here. In this proposal, "C4" is an attribute that can be interpreted either as a graphic (using the current clef), or as providing a default frequency for the audio parser. The values taken by the head attribute are defined in Scientific Pitch Notation to have particular (default) frequencies, so I don't think we need to have a separate oct attribute. (I think oct is an MEI implementation detail). Using pname (short for "pitch name" ?) would be confusing because the pitch (=audio frequency) is context-dependent, and actually completely independent of the value of this attribute. In the current proposal, <note> even has an optional sounding parameter that completely overrides the current context, and can have arbitrary midi.cent frequency values.

clnoel commented 5 years ago

Technically, the "written pitch" is an image on a page, which I think we all agree is way too far towards the graphical side of things. We have already decided that performed frequency (that takes everything into account all the way through transposing instruments and unwritten microtones) is way too far toward the audible side of things, and put that in a separate optional property (sounding pitch).

The question we are addressing here is: Where is the line that establishes enough semantic value to make both a graphical and an audible representation viable (assuming no sounding pitch is specified)?

I would also like to point out that the difficulty of establishing "pitch spelling" (the set of accidentals displayed in the graphics) is one of the reasons we decided to move away from sounding pitch in the first place.

I've been thinking about this a lot since I last commented with the set of options above. I've talked about it with my colleagues here, and I now think I'm actually leaning toward using "E4" (The second option).

With this option, the key-signature has semantic meaning: it is necessary to the audio-parser, which defaults to it if there is nothing in the accidental property. It makes the discussion about how to do pitch-spellings simpler, because the "spelling" part goes in the accidental property, never in the base-pitch.

It also makes the ottava have semantic meaning, which is programmatically equivalent to an intervening clef-change, and changes where the graphical display of the note goes, while being ignorable for an audio parser.

I completely understand that this is a kind of half-and-half representation. I feel that that kind of half-and-half representation is necessary now that we have decided not to go all the way graphical or all the way audible. I acknowledge that this might make some analyses harder, because the fact that it is an Eb, not an E, would need to be figured in by using the accidental. However, given the difficulties in correctly specifing, (e.g.) Ebb4, by putting the "bb" in both the pitch and the accidental properties, I think this also provides less duplication!

--Christina

bhamblok commented 5 years ago

Sorry, I don't agree. An E is an E and an Eb is an Eb. Can you elaborate with some examples how you would encode two sequential E flats (in a key of C Major) where the second one doesn't need the accidental to be shown? It would be really confusing if they are encoded in a different way.

I think "semantics" are utmost superior to written and/or sounding properties.

notator commented 5 years ago

@clnoel and @bhamblok I'm still sitting on the fence about what the head attribute should contain, and don't really want to discuss #153 until the PR in #152 has been resolved. That's so that we know which proposal we are talking about, and don't get confused again. Will we be talking about the proposal in the current spec, or the double-parser proposal?

However: :-) The value of the head attribute could be understood as including the accidental.

<note head="C#4" />

would mean that the graphics parser would write a C4 notehead preceded by a # accidental. The audio parser would interpret that as the default frequency for a C#4. Currently (in Scientific pitch notation) we only have default frequencies defined for noteheads that have no accidentals, but it would be very easy to extend that to define the default frequencies of the noteheads that do have (standard) accidentals. The value of the head attribute is only the name of a bit of graphics. We just have to decide (in #153) whether or not it includes an accidental.

adrianholovaty commented 5 years ago

@clnoel wrote:

I've been thinking about this a lot since I last commented with the set of options above. I've talked about it with my colleagues here, and I now think I'm actually leaning toward using "E4" (The second option).

I don't think I can ever be convinced of options 1 or 2. :-/ An E-flat is not an E. In my view, this doesn't pass a baseline test of "is this note represented semantically?"

Options 1 and 2 require too much knowledge of state (the key signature), for something too important to mess up (the pitch).

It makes the discussion about how to do pitch-spellings simpler, because the "spelling" part goes in the accidental property, never in the base-pitch.

But the base-pitch is part of the spelling, no? Consider G-flat vs. F-sharp. The spelling difference between those two notes exists in the accidental and the base-pitch.

notator commented 5 years ago

"Semantic" is a tricky word... A note element in the XML actually has two meanings, a graphic meaning and an audio meaning. (I still don't want to talk about the way accidentals are handled until PR #152 has been resolved.)

shoogle commented 5 years ago

I like Option 4.

Options 1 and 2 require too much knowledge of state (the key signature), for something too important to mess up (the pitch).

Agreed. They also require an assumption that accidentals remain in effect until the end of the measure, or until superceeded by a different accidental. While this is true for most sheet music, Gould mentions (I forget the page number) that other conventions have existed, such as requiring accidentals to be explicitly stated (i.e. any note without an accidental is a natural). This kind of music can be encoded by Options 3 or 4 but not by 1 or 2 (at least not without risking incorrect playback).

Option 1 or 2 would make sence for OMR, but for pretty much any other use-case Option 3 or 4 is a better choice.

cecilios commented 5 years ago

Sorry for this long post. It is difficult for me to express my ideas in English and this results in a longer text. Sorry!

In the beginning, more or less we all assumed that MNX would follow MusicXML for representing pitch, as no issues were raised with MusicXML pitch representation.

Later, an important question was raised: the issue of what to do for transposing instruments. This introduced the concept of written pitch vs. sounding pitch. But in any case, when the issue was raised, the meaning of 'written pitch' and 'sounding pitch' was basically:

After some argumentation it was clear that written pitch (what MusicXML uses) is the most practical. This should have been closed this issue, so that we can proceed with other work.

Unfortunately, the words 'written pitch' and 'sounding pitch' are open to interpretation, and the Pandora box was open if we interpret those words differently. And IMO this is current situation: a lot of different proposals trying to solve non known problems of MusicXML approach.

Music is sound. And for more than ten centuries people has being trying to represent music with symbols. The music score is the best system found for this. So, now we are trying to represent the music score (not its graphical appearance but its content) but using 'computer symbols'. To me the best approach is to mimic the music score (the best known system to represent music, apart from audio recordings). The notes are represented by a notehead placed on a staff and the sound (pitch) is implied by many other symbols: the notehead position on the staff, the clef, the accidentals, the 8va marks, etc.

To me, when we talk about 'written pitch' I understand 'notehead position on the staff' and the simplest way of expressing this location is by the 'displayed pitch' (this is basically what MusicXML uses). So in @clnoel example, notehead position is E5 (or Eb5 -- more on this later --). To me the current problems arise when we compare this written pitch with the sounding pitch, as in this example they are different. But the problem disappears if we return to the idea of understanding 'written pitch' not as pitch but as 'position on the staff'. So E5 is not a pitch but a reference to notehead position: 'notehead on fourth space'. That is MusicXML understanding and that is what I propose to follow. It does not give preference to 'sound' parsers nor to 'graphical parses. It is just a way of expressing were the notehead is placed on the staff.

Now to the issue of E5 vs Eb5. MusicXML takes into account applicable accidentals and would use Eb5, I assume that this decision was taken to simplify having to track applicable accidentals. For long time, in my applications I choose the opposite approach, use E5 (as if a was writing the score with pen and paper) and force the application to compute applicable accidentals. In my many years experience, I found that both systems work well and no special problems arise with any of them. But I have found MusicXML system (Eb5) better than my application system (E5), as it simplifies the algorithms for preserving displayed accidentals when a transposition is applied.

So, my vote is for current MusicXML approach, Eb5, option 3. Although option 1, E5, would also be acceptable to me.

Hope this helps!

mdgood commented 5 years ago

Closed with #152.