Questions about lyrics in Humdrum

humdrum-tools / verovio-humdrum-viewer

Verovio Humdrum Viewer

http://verovio.humdrum.org

37 stars 9 forks source link

Questions about lyrics in Humdrum #657

Open WolfgangDrescher opened 2 years ago

WolfgangDrescher commented 2 years ago

In the documentation it says that one should use the **text interpretations if you are not using hyphenation or **silbe if you are using syllables for each note. Nevertheless all Kern files I saw so far are using **text with syllables on each note. E.g. here, here, here. It this an undocumented convention?
Is the text command working? Running extract -f 2 kern/01-beatus-vir.krn | text on this file with modified **silbe insted of **text I just get null tokens returned.
What is the correct usage of the phrase feature within **text spines? Like this? Or would I add text repetitions into the same phrase like this? Is there a way to use this information of the phrases later on, e.g. to help other commands to detect cadences?
extract -i '**silbe' chant12 | text | context -e '[.,;:?!]' | rid -GLId from the documentation seems really useful. Does this command strip away lyrics marked with *ij or *edit? I cannot test it since text is not working in my case.
Is there a good command to automatically (or semi automatically) detect text repetition? It would be a useful information to check if parts of phrase are repeated and which, or if a whole phrase is repeated. Or even if some parts of the lyrics are left out in other voices.
Is it possible to add those lines at the end of a word when a melisma follows like shown in the documentation (e.g. in bar 16: "and_____"). Not sure tough if this is Verovio or Humdrum related.

craigsapp commented 2 years ago

In the documentation it says that one should use the text interpretations if you are not using hyphenation or silbe if you are using syllables for each note. Nevertheless all Kern files I saw so far are using **text with syllables on each note. E.g. here, here, here. It this an undocumented convention?

Yes, I would say that I have been lazy and only implemented basic **text features. Looking at the reference manual documentation, I suspect that **text was the original encoding for text, but the these concepts were planned to be moved to **silbe and **text would be for unhyphenated content.

The basic problem is that I am usually importing from MusicXML and do not want to have to edit the text to make it fully compliant with a formalized system. So I probably left it as **text to indicate more unstructured content (but I use the hyphenation system in **text).

In VHV (verovio), I treat **text and **silbe equivalently, so you can use either one to display text with the music notation:

One thing that I notice is that there is no **silbe entry on the representation types documentation page (the old one, since the one for the updated website is not yet been added):

https://www.humdrum.org/Humdrum/representations.toc.html

There is this page which is not listed:

https://www.humdrum.org/Humdrum/representations/silbe.rep.html

I do not see an entry for silbe in the 1995 version of the Humdrum Reference Manual:

https://github.com/humdrum-tools/humdrum-documentation/tree/master/scans/hrm1995

That is probably another reason why I have not used **silbe :-), but in any case, I have not been following all of the formalisms of **text:

Some of these are outdated in my opinion such as the character accent encodings, which I prefer to do in UTF-8. I also use -- instead of ~ (this frees up ~ for some other purpose, and I do not add spaces before punctuation.

**text interpretations if you are not using hyphenation

In the documentation page for **text the hyphenation system seems to be defined:

https://www.humdrum.org/Humdrum/representations/text.rep.html

N.B.: The Tasso in Music Project has a lots of potential for analysis, particularly since it has encodings of multiple compositions by various composers that set the same poem to different music.

craigsapp commented 2 years ago

Is the text command working? Running extract -f 2 kern/01-beatus-vir.krn | text on this file with modified **silbe insted of **text I just get null tokens returned.

For the original Humdrum Toolkit commands, they all have an -h option that give a one-screen synopsis of the command:

So I think the problem is that the text tool by default only looks at the first spine. The -f option will allow looking at another spine such as text -f 2 for the Bassus text.

Looking at the source code for text, I see that it is expecting **silbe as input:

Here are some samples uses that are working:

$ extractx -s 2 text Beatus_vir.krn | shed -e 's/text/silbe/X' | text  | ridx -H | fmt

Selig zu preisen ist der mann ist der mann der sich entheltvonden
gottlosen und wandelt nicht im rath der bösen trit auch nicht auf
der süder ban noch sitzt noch sitzt bey güfftig bösen rotten da man
honschimpflich weiß zu spotten zu spotten da man honschimpflich

the *edit and *ij system is a later edition by me, so it is not directly known by the text tool. However, here is one way to utilize it (I added an example *ij/*Xij to the score:

extractx -s 2 text /tmp/b.krn | shed -e 's/text/silbe/X' | text | sed 's/^\*ij$/<i>/; s/^\*Xij$/<\/i>/' | ridx -H | fmt

Selig zu preisen ist der mann <i> ist der mann </i> der sich
entheltvonden gottlosen und wandelt nicht im rath der bösen trit
auch nicht auf der süder ban noch sitzt noch sitzt bey güfftig bösen
rotten da man honschimpflich weiß zu spotten zu spotten da man
honschimpflich weiß zu spotten.

Where I converted *ij into <i> and *Xij into </i>:

Selig zu preisen ist der mann ist der mann der sich entheltvonden gottlosen und wandelt nicht im rath der bösen trit auch nicht auf der süder ban noch sitzt noch sitzt bey güfftig bösen rotten da man honschimpflich weiß zu spotten zu spotten da man honschimpflich weiß zu spotten.

Another example would be to remove the repeated text:

 extractx -s 2 text /tmp/b.krn | shed -e 's/text/silbe/X' | text | sed 's/^\*ij/XXX/; s/^\*Xij$/YYY/' | ridx -H | fmt -w 100000 | perl -pe 's/\bXXX.*?YYY\b\s*//g' | fmt

Selig zu preisen ist der mann der sich entheltvonden gottlosen und
wandelt nicht im rath der bösen trit auch nicht auf der süder ban
noch sitzt noch sitzt bey güfftig bösen rotten da man honschimpflich
weiß zu spotten zu spotten da man honschimpflich weiß zu spotten.

The sed command only uses basic regular expressions, so I instead used perl to have access to the PERL regular expression syntax:

perl -pe 's/\bXXX.*?YYY\b\s*//g'

And this means:

\b  == a word boundary (no letters on the left)
XXX == the letters XXX
.*? == 0 ore more characters that do not include "YYY"
YYY = the letters YYY
\s*  == zero or more spaces

After all of that, I will point you to the lyrics tool that I created, and which is part of the Humdrum Extras codebase:

lyrics Beatus_vir.krn --html > bv.html

This is the core of the lyric extraction tool on the Tasso website.

It handles multiple inputs of **text data and has a user friendly output:

Notice that I also have a mouseover feature that highlights the same text in other locations in red.

I can enhance this tool as needed with options to convert *italic, *edit and *ij text to italic text, and/or have a removal option for *ij enclosed text.

WolfgangDrescher commented 2 years ago

I found a nice way to compare the lyrics with a git diff view:

I think this works best for me to compare what part of the lyrics have been added or removed and compare it with a normalized base version of the lyrics without any text repetitions.

This base version of the lyrics still need to be controlled by a human and it's not possible to autogenerate from the current humdrum tools I think, since text repetitions can occur in the lyrics even without "ij": VHV

But autogenerating the lyrics of the voices should be possible with the commands and examples that you sent me. Thank agin your help here. However I seem to have "lost" the shed filter in the CLI. which shed will not find a command. Also reinstalling the humdrum-tools with both humextra and humdrum did not help me here. I also couldn't find the command in a program in one of the both repositories. Am I on the wrong branch or something?

WolfgangDrescher commented 2 years ago

One more thing: I think it could be really nice to have an additional options on the lyrics command to export it to JSON or YAML. I started trying a few things in the lyrics Perl script, but there are a lot of direct prints in the script without collecting the data first. So I could not implement it without refactoring the whole file. Maybe a workaround could be to use output buffers? Not sure tough if it exist in Perl tough.

lyrics Beatus_vir.krn --json

craigsapp commented 2 years ago

However I seem to have "lost" the shed filter in the CLI. which shed will not find a command. Also reinstalling the humdrum-tools with both humextra and humdrum did not help me here. I also couldn't find the command in a program in one of the both repositories. Am I on the wrong branch or something?

Shed is a newer tool, and newer tools (since 2015) are implemented with the humlib parser (eventually this will be merged into humextra). (Humlib was set up for implementing filters for verovio, and it uses a somewhat more modern C++ style.) So you need to install humlib:

git clone https://github.com/craigsapp/humlib
cd humlib
make
sudo make install   # copies programs to /usr/local/bin

If you already have humlib, then in linux you can try "locate humlib" to find it (this can be done in MacOS, but there is a service that needs to be started first which will be explained if you try to use locate the first time — otherwise, the Spotlight system is the Mac equivalent to the unix locate command).

Also to update humlib (such as getting the most recent cint updates), go to the humlib directory and type:

make update
make
sudo make install

I should rename the older cint program in humextras to cintx to avoid it from hiding the humlib version. Until then, you should make sure that the command search path variable in the unix shell $PATH has /usr/local/bin before the humextra/bin and humdrum/bin directories (which file sets the $PATH will depend on the specific unix shell and other factors).

The command which cint will tell you where the command-line is finding cint (ideally in /usr/local/bin from humlib).

craigsapp commented 2 years ago

I found a nice way to compare the lyrics with a git diff view:

That is a good way. On the command-line vimdiff is a similar tool (particularly useful if you know the vim editor).

How would you want lyrics --json to work? (i.e., give an example input/output). I think you are planning on using the / character to segment the music? That of course is specific to your repertory and not common in lyrics otherwise. There is the phrase markers { and } which would be possible to use to mark lines of a poem, which can then be used to segment the lyrics after extraction.

I can envision another interpretation markup for text: *rep and *Xrep (to cancel) which would be encoded in scores to indicate repetitions in the poetic text. This would essentially function similar to *ij, but is for marking repetitions that are written out in the score (typically without italic font being involved).

WolfgangDrescher commented 2 years ago

Ah, I was probably using the wrong or outdated tools all the time:

 $ which cint
~/humdrum-tools/humextra/bin/cint

And after installing craigsapp/humlib it is now:

$ which cint
/usr/local/bin/cint

So I probably just hadn't installed it at all before (and I was just using shed as a filter in the Kern files directly I guess…). I do not get it running properly yet, but that's another problem :-) .

I don't have a good overview of where all the programs and commands are located on GitHub. Some are in humdrum-tools, some in your personal repository craigsapp and some in both. Initially I installed humdrum-tools/humdrum-tools and the included submodules humdrum-tools/humdrum, craigsapp/humextra and humdrum-tools/humdrum-data.

Go get a better understanding I try to summarize up how I understand it:

humdrum-tools/humdrum is your fork of the original humdrum repository
craigsapp/humextra are some extra commands
craigsapp/humlib more extra commands like kern2mens and optimized versions of commands from humextra (e.g. cint)
humlib is used as bridge to Verovio and converts Kern files to MEI so Verovio can display them (not sure about this one tough, I only found mei2hum)

Also I don't know what the ideal setup is locally. I followed the readme in the humdrum-tools repo and now installed additionally humlib.

Are all commands with a trailing "x" the outdated versions? E.g. extract vs. extractx.

Could it be a good idea to "namespace" the commands so it's more clear which library one is using the command from? E.g.:

$ humdrum hint
$ humlib cint

This would also be beneficial to avoid conflicts with other commands:

$ which context
/Library/TeX/texbin/context

WolfgangDrescher commented 2 years ago

How would you want lyrics --json to work?

{
    "cantus": "Selig zu preisen ist der mann/\nselig zu preisen ist der mann/\nder sich enthelt von den gottlosen von den gottlosen/\nund wandelt nicht im rath der bösen/\ntrit auch nit auff der sünder ban/\nnoch sitzt bey gifftig bösen rotten/\nda man honschimpflich weiß/\nda man honschimpflich weiß zu spotten.\n",
    "tenor": "Selig zu preisen ist der mann/\nder sich enthelt von den gottlosen/\nund wandelt nicht im rath der bösen/\ntrit auch nicht auf der sünder ban/\nnoch sitzt bey güfftig bösen rotten/\nda man honschimpflich weiß zu spotten/\nda man honschimpflich weiß zu spotten/\nzu spotten.\n",
    "bassus": "Selig zu preisen ist der mann/\nist der mann/\nder sich enthelt von den gottlosen/\nund wandelt nicht im rath der bösen/\ntrit auch nicht auf der sünder ban/\nnoch sitzt bey güfftig bösen rotten/\nda man honschimpflich weiß zu spotten/\nzu spotten/\nda man honschimpflich weiß zu spotten.\n"
}

This would allow me to easily parse and store it for later usage. See my current example: https://github.com/WolfgangDrescher/lassus-geistliche-psalmen/blob/master/meta/01-beatus-vir.yaml I created the YAML file by hand just to get things work for me for now.

I think you are planning on using the / character to segment the music?

Yes, but as you said it's repertoire specific. And even here Lassus is not always using it consistently with repetitions. And for the "ij" sections it's also difficult to decide where the text should be segmented by a slash. But I can parse this after extracting the lyrics out of a voice and split the string by / or even ,. So maybe the example JSON above would even be more generic with spaces instead of \n as separators of the phrases.

There is the phrase markers { and } which would be possible to use to mark lines of a poem, which can then be used to segment the lyrics after extraction.

Yes thats a good idea however they were always rendered by Verovio. See my example fom question 3. above:

But I'm probably not using them correctly.

I can envision another interpretation markup for text: rep and Xrep (to cancel) which would be encoded in scores to indicate repetitions in the poetic text. This would essentially function similar to *ij, but is for marking repetitions that are written out in the score (typically without italic font being involved).

That would actually be useful for my case to automate the generation of the lyrics. I mean with my corpus of just 50 pieces it's not a big deal for now to make it by hand. Maybe it's even faster than encoding it in the score. But for further studies and projects this could be useful.

Would there be a MEI representation for *rep, *ij and *edit?

WolfgangDrescher commented 2 years ago

After some debugging I realized that text is also using context which is mapped to the wrong program in my setup (/Library/TeX/texbin/context; as already mentioned before). As the TeX context is added to $PATH in /etc/paths.d (see here) and humdrum-tools make install will add the command search path into ~/.zshenv in my case, I think I have no choice other than hard coding the correct path to context directly in the file ~/humdrum-tools/humdrum/bin/text (third last line):

# Extract all **silbe spines.  Extract the first spine (to avoid more than one).
# Replace **silbe rests (%) and graphic hyphens (|) by null tokens.
# Eliminate leading and trailing hyphens.
# Then format the resulting text.
sh extract -i '**silbe' $FILENAME | sh extract -f $FIELD \
  | sh rend -i '**silbe' -f $HUMDRUM/bin/text.rnd > $TMPDIR/$$.cxt
sh extract -f 1 $TMPDIR/$$.cxt > $TMPDIR/$$.1xt
sh extract -f 2 $TMPDIR/$$.cxt \
  | sed 's/^%$/./; s/^|$/./; s/\*\*other/**text/g' \
  | sed 's/^-\(.*\)-$/\1/; s/^-\(.*\)\([^-]\)$/\1\2+/' \
  | sh /path/to/humdrum-tools/humdrum/bin/context -b '-' -e '\+' | sh humsed 's/[- /+]//g' > $TMPDIR/$$.2xt
assemble $TMPDIR/$$.1xt $TMPDIR/$$.2xt | sh cleave -i '**text,**barlines' -o '**text'
rm $TMPDIR/$$.[12c]xt

If I could change the ordering of $PATH it should work. So this is not a bug of humdrum but a "bad" setup of me. However a "prefix" parent command for all humdrum/humlib commands as mentioned before would help here. But looking at the current code sample of context this could break a lot of the humdrum awk commands.

How would you want lyrics --json to work?

I quickly made a quick and dirty version in Node.js to get JSON from the current output of the lyrics command without refactoring the whole file. Maybe it helps as an inspiration for the Perl script:

import fs from 'fs';
import { exec } from 'child_process';

function getFiles(directory, fileList) {
    fileList = fileList || [];
    const files = fs.readdirSync(directory);
    for (let i in files) {
        const name = `${directory}/${files[i]}`;
        if (fs.statSync(name).isDirectory()) {
            getFiles(name, fileList);
        } else {
            fileList.push(name);
        }
    }
    return fileList;
}

function splitInChunks(array, chunkSize) {
    const chunks = [];
    for (let i = 0; i < array.length; i += chunkSize) {
        chunks.push(array.slice(i, i + chunkSize));
    }
    return chunks;
}

const file = 'kern/01-beatus-vir.krn';
exec(`lyrics ${file}`, (err, stdout, stderr) => {
    if (err) {
        return;
    }
    const piece = {
        voices: {},
    };
    let parts = stdout.split('==');
    piece.title = parts.shift().split(':')[1].trim();
    parts = parts.map(line => {
        return line.trim();
    });
    parts = splitInChunks(parts, 2);
    parts.forEach(([key, lyrics]) => {
        key = key.toLowerCase();
        piece.voices[key] = {};
        piece.voices[key].lyrics = lyrics.split(/(?<=\/)/).map(line => line.trim());
    });
    console.log(JSON.stringify(piece));
});

Input:

TITLE: Beatus vir

== Cantus ==

     Selig zu preisen ist der mann/ selig zu preisen ist der mann/ der sich enthelt von den gottlosen von den gottlosen/ und wandelt nicht im rath der bösen/ trit auch nit auff der sünder ban/ noch sitzt bey gifftig bösen rotten/ da man honschimpflich weiß/ da man honschimpflich weiß zu spotten.

== Tenor ==

     Selig zu preisen ist der mann/ der sich enthelt von den gottlosen/ und wandelt nicht im rath der bösen/ trit auch nicht auf der sünder ban/ noch sitzt bey güfftig bösen rotten/ da man honschimpflich weiß zu spotten/ da man honschimpflich weiß zu spotten/ zu spotten.

== Bassus ==

     Selig zu preisen ist der mann/ ist der mann/ der sich enthelt von den gottlosen/ und wandelt nicht im rath der bösen/ trit auch nicht auf der sünder ban/ noch sitzt noch sitzt bey güfftig bösen rotten/ da man honschimpflich weiß zu spotten/ zu spotten/ da man honschimpflich weiß zu spotten.

Output:

{
    "voices": {
        "cantus": {
            "lyrics": [
                "Selig zu preisen ist der mann/",
                "selig zu preisen ist der mann/",
                "der sich enthelt von den gottlosen von den gottlosen/",
                "und wandelt nicht im rath der bösen/",
                "trit auch nit auff der sünder ban/",
                "noch sitzt bey gifftig bösen rotten/",
                "da man honschimpflich weiß/",
                "da man honschimpflich weiß zu spotten."
            ]
        },
        "tenor": {
            "lyrics": [
                "Selig zu preisen ist der mann/",
                "der sich enthelt von den gottlosen/",
                "und wandelt nicht im rath der bösen/",
                "trit auch nicht auf der sünder ban/",
                "noch sitzt bey güfftig bösen rotten/",
                "da man honschimpflich weiß zu spotten/",
                "da man honschimpflich weiß zu spotten/",
                "zu spotten."
            ]
        },
        "bassus": {
            "lyrics": [
                "Selig zu preisen ist der mann/",
                "ist der mann/",
                "der sich enthelt von den gottlosen/",
                "und wandelt nicht im rath der bösen/",
                "trit auch nicht auf der sünder ban/",
                "noch sitzt noch sitzt bey güfftig bösen rotten/",
                "da man honschimpflich weiß zu spotten/",
                "zu spotten/",
                "da man honschimpflich weiß zu spotten."
            ]
        }
    },
    "title": "Beatus vir"
}

craigsapp commented 2 years ago

Is it possible to add those lines at the end of a word when a melisma follows like shown in the documentation (e.g. in bar 16: "and_____"). Not sure tough if this is Verovio or Humdrum related.

A lot of questions, which I may eventually get around to answering :-)

I have been avoiding the melismatic underlines as an automatic feature, but it is possible to add them manually right now:

**kern  **text
=1  =1
4c  a
4d  b
4e  .
4f  c
=2  =2
4c  a
4d  b_
4e  .
4f  c
=   =
*-  *-

I like the non-underscore encoding, and if anyone wants the underscores to display, the best system would be to add an interpretation to turn them on without having to encode them in the data, something like:

**kern  **text
*   *ul
=1  =1
4c  a
4d  b
4e  .
4f  c
=2  =2
4c  a
4d  b_
4e  .
4f  c
=   =
*-  *-

where *ul would mean add the melismatic underlines at ending syllables of words.

In addition there could be a filter such as underline that adds them automatically to the data.