w3c / mnx

Music Notation CG next-generation music markup proposal.
179 stars 18 forks source link

Encoding spaces, hyphens, phrases, and blocks in lyrics #357

Open samuelbradshaw opened 1 month ago

samuelbradshaw commented 1 month ago

There are several challenges that make it difficult to extract lyrics from sheet music in a readable form. Among them:

Not all applications need a way to extract lyrics from sheet music. However, extracting lyrics is helpful for music viewers where users have varying musical experience. It can be used to generate guitar lyric sheets (lyrics lined up with chords). I also predict it will become important in the future for computers to be able to extract well-formatted lyrics from sheet music behind the scenes for things like generated singing (intersection between MIDI and text-to-speech).

I'd like to propose [Proposal 1] that spaces and the two types of hyphens be explicitly encoded in the sheet music. The syllables in a phrase like "Sun-dried raisins? Yes!" might look like this:

{ "text": "Sun-" } … { "text": "dried " } … { "text": "rai•" } … { "text": "sins? " } … { "text": "Yes!" }

Notice that the original hyphen and spaces are preserved. "•" is used to indicate a soft hyphen (following the syntax in dictionary definitions, where "•" is placed between syllables). Alternatively, it could be something like this (a little more verbose):

{ "text": "Sun" "hyphen": "grammatical" } … { "text": "dried " } … { "text": "rai", "hyphen": "discretionary" } … { "text": "sins? " } … { "text": "Yes!" }

Or (even more verbose):

{ "text": "Sun" "hyphen": "grammatical" } … { "text": "dried", "suffix": " " } … { "text": "rai", "hyphen": "discretionary" } … { "text": "sins", "suffix": "? " } … { "text": "Yes", "suffix": "!" }

In issue https://github.com/w3c/mnx/issues/354 and pull request https://github.com/w3c/mnx/pull/355, two paradigms were discussed for managing syllables: start, middle, end, and whole syllable types (borrowed from MusicXML) or continues or hyphen drawing instructions. I'd like to advocate [Proposal 2] that we use the continues/hyphen paradigm (sorry I've gone back and forth on this).

What MusicXML does is semantic (usually good) – but only in alphabetic writing systems. In many Asian languages, you have to "break the rules" to get the sheet music you want. For example, in Chinese, each character is equivalent to a syllable. Most Chinese words are composed of two characters. Following the MusicXML pattern, it would be intuitive to mark the first character of a word as start, and the second character as end. But there's a problem – Chinese sheet music isn't drawn with hyphens between syllables. So, you have to mark each character as whole (neither semantic nor intuitive).

One of the arguments for sticking with the MusicXML pattern is because knowing start and end can help a graphics engine provide enough space between words. I think [Proposal 1] can meet this need by preserving spaces, which will naturally keep separate words apart.

Finally, I think it would be helpful to add [Proposal 3] attributes that indicate the end of a lyric phrase and/or lyric block. Something like this:

{ "text": "Sun" "hyphen": "grammatical" } … { "text": "dried " } … { "text": "rai", "hyphen": "discretionary" } … { "text": "sins? " } … { "text": "Yes!", "endPhrase": "true", "endBlock": "true" }