Semantic encoding of lyrics separate from the music notation

I would like to propose the ability for lyrics to be encoded separate from the rest of the music notation. This would allow greater flexibility for how and where the lyrics appear, using styles (for example, display: inline|block|inline-block|none), and allow alternate lyrics to be swapped in and out.

Here is an example of how the lyrics could be encoded:

<lyrics>
  <lyrics-set id="g1" lang="en">
    <lyrics-set-name>Happy Birthday (English)</lyrics-group-name>
    <lyric-unit id="u1" class="verse">
      <lyric-unit-name>Verse 1</lyric-unit-name>
      <lyric-unit-prefix>1.</lyric-unit-prefix>
      <lyric-line id="l1" part="p1" primary="true">
        <text id="t1" note="n1" syllabic="begin">Hap</text>
        <text id="t2" note="n2" syllabic="end">py</text>
        <text id="t3" note="n3" syllabic="begin">birth</text>
        <text id="t4" note="n4" syllabic="end">day</text>
        <text id="t5" note="n5" syllabic="single">to</text>
        <text id="t6" note="n6" syllabic="single">you,</text>
        <extend />
      </lyric-line>
      ...
      <lyric-line id="l2" part="p2" primary="false">
        <text id="t24" note="n24" syllabic="begin">Hap</text>
        <text id="t25" note="n25" syllabic="end">py</text>
        <text id="t26" note="n26" syllabic="begin">birth</text>
        <text id="t27" note="n27" syllabic="end">day,</text>
        <text id="t28" note="n28" syllabic="begin">hap</text>
        <text id="t29" note="n29" syllabic="end">py</text>
        <text id="t30" note="n30" syllabic="begin">birth</text>
        <text id="t31" note="n31" syllabic="end">day,</text>
      </lyric-line>
      ...
    </lyric-unit>
    ...
  </lyrics-set>
  ...
  <lyrics-set id="g2" src="en-GB.xml" />
  <lyrics-set id="g3" src="es.xml" />
</lyrics>

A few use cases:

Musicologist is studying evolution of the lyrics of a song over time and wants to easily switch different versions of the lyrics in and out.
Developer is developing a mobile sheet music app for multilingual users, and wants to save space on the user's device by only downloading the music once, and downloading the lyrics only in the languages the user speaks.
Developer is developing a sheet music experience for users with small screens, and wants to show two verses at a time, but dynamically rotate which verses are shown as the user finishes singing a given verse.
Developer wants to apply styles to one verse at a time, for example putting the current verse in bold.
Performer wants to sing only verses one and three, and hide the other verses in the sheet music.
Performer is using a device with a small screen and wants to see only their instrumental part without the words.
Performer is using a device with a small screen and wants to see only the words, as they're already familiar with the music.
Publisher wants to publish the same song in several languages, or variations of the same language (like en-US and en-GB) and wants to use the same sheet music file for all languages.
Publisher is preparing an edition of a song for a region where two languages are spoken, and wants to display verses in multiple languages inline at the same time.
Publisher is preparing an edition of the song with space constraints and wants to easily style the song so that only two verses are displayed inline, and the rest are displayed below.
Publisher wants to send lyrics to a contractor for translation, without the extra clutter and risk of errors introduced into the sheet music notation.
Editor wants to proofread the lyrics separately from the sheet music to fix small errors that would be easy to miss when the lyrics are displayed inline.
Student wants to view an individual vocal part by turning off the notes and lyrics for other vocal parts, but keeping his lyrics visible.
Student wants to listen to an individual vocal part of the sheet music using a pitch-aware text-to-speech engine.

It's not actually necessary to make the lyrics separate to get those benefits...

Integrated lyrics

<event value="3/16">
  <note pitch="C4"/>
  <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="begin">Hap</lyric>
    <lyric verse="1" lang="it_IT" syllabic="begin">Tan</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="melisma-start">Good</lyric>
  </lyric-set>
</event>
<event value="1/16">
  <note pitch="C4"/>
  <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="end">py</lyric>
    <lyric verse="1" lang="it_IT" syllabic="end">ti</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="melisma-end"/>
  </lyric-set>
</event>
<event value="/4">
  <note pitch="D4"/>
    <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="begin">birth</lyric>
    <lyric verse="1" lang="it_IT" syllabic="begin">augu</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="begin">morn</lyric>
  </lyric-set>
</event>
<event value="/4">
  <note pitch="C4"/>
    <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="end">day</lyric>
    <lyric verse="1" lang="it_IT" syllabic="begin">ri</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" syllabic="end">ing</lyric>
  </lyric-set>
</event>
<event value="/4">
  <note pitch="F4"/>
  <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US">to</lyric>
    <lyric verse="1" lang="it_IT">a</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US">to</lyric>
  </lyric-set>
</event>
<event value="/2">
  <note pitch="E4"/>
  <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" line="end">you,</lyric>
    <lyric verse="1" lang="it_IT" line="end">te,</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US" line="end">you,</lyric>
  </lyric-set>
</event>

... though admittedly it is much neater to have them separate.

If you're going to have the lyrics separate, why not go the whole hog:

Line-based lyrics

<lyrics>
  <lyric-set title="Happy Birthday to You!" orig-lang="en_US">
    <lyric verse="1" lang="en_US">Hap-py birth-day to you,</lyric>
    <lyric verse="1" lang="it_IT">Tan-ti augu-ri a te,</lyric>
  </lyric-set>
  <lyric-set title="Good Morning to all!" orig-lang="en_US">
    <lyric verse="1" lang="en_US">Good_ morn-ing to you,</lyric>
  </lyric-set>
</lyrics>

This assumes we split lyrics on spaces `, hyphen-minus-and underscores_, so there would need to be a way to escape these characters if they are used within a lyric (e.g. by preceding them with a backslash`).

Semantic information

Things that ought to be encoded, regardless of encoding scheme:

Lyric "class" (Verse / translation / alternative / optional)
Repeating units (refrain, chorus, etc)
Language / Locale of translations
Language / Locale of the original (so that translations are always based on the original)
Part / staff / character name(s) to which lyrics belong
Source of translations / alternative lyrics (original / editorial [which edition] / added by user)
Separate licensing and copyright of lyrics translations
Where line breaks should occur when lyrics are written in libretto form
(if possible) how rhythms are altered for different translations/alternatives/verses
(if possible) how lyrics are shared between parts (Alto lyrics = same as Soprano lyrics)

I think this issue suffers a bit from focusing on a solution, not the problem. As @shoogle said, if the problem is to characterize lyrics as belonging to languages and other classification schemes, that does not require MNX to segregate lyrics in a separate container element.

My analysis of the segregation approach is that it's not a net win. At the moment, lyrics belong to events, and that forces their classifiers to be messy attributes. If lyrics belong to abstract "sets" for the classifiers, that forces their owning events to be messy attributes. Not a nice setup either way, but since a lyric must belong to an event, (it might not have a classifier), it makes sense to design the schema around the event ownership.

I suggest retitling the issue to reflect the desire for languages or classifiers, provide a few solid examples and then we can see what is the best solution to that problem.

Another argument in favour of integrated lyrics is that, since lyrics belong to an "event", if you delete an event then you automatically delete any lyrics associated with it, whereas if lyrics are separate then you have to go looking for them. If an implementation fails to go looking for lyrics then it will lead to big problems deleting notes (or measures or staves for that matter), and it's a similar story for inserting notes too.

Separate lyrics certainly look nicer for humans reading the file, but now imagine you need to find the note with the "birth" lyric. This is also easier if lyrics are stored inside events.

@shoogle, I thought about going "whole hog," so to speak – not putting tags around each syllable – but the main limitation is that it would keep you from being able to precisely line up the lyrics exactly where you want them... Otherwise, I would be completely for it. :)

Having integrated lyrics in multiple languages, as in @shoogle's first example, could work for a couple of languages, but it gets difficult to manage when you're working with 20+ languages. It doesn't meet the need to reduce bandwidth consumption or save space on a user's device by only downloading what a particular user needs. And, it bulks up the size of the file if you need to assign verse numbers, languages, and styling to each syllable instead of to a parent element.

@samuelbradshaw, the idea is that content distributors would keep a master file that contains all languages, but they would strip out the unnecessary ones before serving the file to consumers.

By default, users would only see two languages:

The original language
- e.g. Italian for "Le nozze di Figaro"
Their native language
- so I would see lyrics for "The Marriage of Figaro" in English under the original Italian lyrics

But users could be allowed to request more languages if they want.

Regarding attributes increasing the file size, this is certainly true, but it's not really a problem if the file is stored and distributed in compressed form. Anyway, it should be possible to minimise the extra space if careful though is put into the encoding scheme. It might not be necessary to store the verse number (or any other attribute for that matter) in every syllable, for example; it might be good enough to store the verse number in the first syllable and assume it is the same for all subsequent syllables on the same line (until a different verse number is given).

@samuelbradshaw I think that lyrics need to be event based. If we are running into problems due to duplicating elements, then we need a parent element, such as the following:

<event value="3/16">
  <note pitch="C4"/>
  <lyric-set verse="1" syllabic="begin">
    <lyric lyric-id="en_US">Hap</lyric>
    <lyric lyric-id="it_IT">Tan</lyric>  
    <lyric lyric-id="en_US-2" syllabic="melisma-start">Good</lyric>
  </lyric-set>
</event>

And then we add a descriptor outside the event listings (in the global?):

<lyric-lists>
   <lyric-info lyric-id="en_US" title="Happy Birthday to You!" orig-lang="en_US"/>
   <lyric-info lyric-id="it_IT" title="Happy Birthday to You!" orig-lang="en_US"/>
   <lyric-info lyric-id="en_US-2" title="Good Morning to You!" orig-lang="en_US"/>
</lyric-lists>

The problems with showing or hiding any given set of lyrics are handled with the CSS styling, which can make it so that any lyric with the lyric-id "en_US-2" has "display: none", and any lyric with the lyric-id "it_IT" is in italics, for example. Then, applying different style sheets to the same file will produce different language instances.

This doesn't hit all the use cases, granted, especially the translation problem. But it is still reasonably possible to extract a set of lyrics from this file. You just scan through for the correct lyric-id and compile the words together. And that might actually be MORE readable to a program extracting the text than a set of words with a bunch of hyphens in it.

Also, @shoogle, the problem I have with line-based lyrics is that it becomes a real pain to make sure that your melisma (or hyphen) covers all the notes you need it to. What if you have an "Amen" that covers 13 notes for the "A", and another 12 notes for the "men"?

@clnoel, that's easy:

A-------------men____________

This is perfectly readable and immediately understandable. Sure, it's easy loose count when editing by hand, but most editing is not done by hand and it's not a particularly common case anyway.

Nevertheless, I agree that integrated lyrics are better overall. (It's difficult to apply styles to individual syllables in a line-based system, for example.)

It might be a good idea to move the descriptors outside the events, but I wouldn't try to combine the lang and id attributes. I'd also try to keep the relationship between the translations and alternatives clearer:

<lyrics>
  <lyric-set lang="en_US" title="Happy Birthday to You!">
    <localisation lang="it_IT" title="Tanti auguri a te"/>
  </lyric-set>
  <lyric-set orig-lang="en_US" title="Good Morning to All!"/>
</lyrics>

There is not necessarily any need for IDs, if we can assume that (e.g.) the 3rd lyric under each note belongs to the set for "Good Morning to All". (It's worth mentioning here that "Good Morning to All!" is not a second verse: it is an alternative set of lyrics that fits the melody - in fact they are the original lyrics.)

Just to play devil's advocate, an interesting feature of storing lyrics separately is the ability to reuse them elsewhere in the score. Imagine an SATB harmony to "Happy Birthday to You!" where each of the 4 parts sings the same lyric but at a different pitch (the rhythm can change too as long as the number of notes each lyric is sung for stays the same). In this case, each part could refer to a shared set of lyrics elsewhere in the document.

Marking this as closed, for the sake of cleaning up our issue database. Personally I agree with Joe's comment above: https://github.com/w3c/mnx/issues/139#issuecomment-407475229

w3c / mnx