Closed rettinghaus closed 5 years ago
There has been confusion related to this before (particularly when someone adds @con
and nothing happens), so I would not mind the current behavior to be modified. One downside would be that @wordpos
is semantic/logical, but @con
is not since it is a visual attribute. If you allow @con
without @wordpos
, then a non-semantic encoding of the lyrics is more possible, making tasks such as lyric text extraction from the music more difficult. So it might be useful to prevent display of @con
if given without @wordpos
. Other than that, setting default values for @con
based on @wordpos
would be nice, since this will minimize cluttering of obvious attribute values.
So I propose:
(1) @wordpos="i"
or @wordpos="m"
without @con
would implicitly supply @con="d"
by default.
(2) @wordpos="t"
without @con
would implicitly supply @con="s"
by default.
(3) To override the defaults, @con
would be explicitly be supplied.
(4) Importantly, if there is an explicit @con
without a @wordpos
, the @con
should be ignored. In other words, @wordpos
functions like a subelement of <syl>
, and @con
would be an attribute of <wordpos>
rather than of <syl>
.
(5) There should be a styling feature in MEI (or initially in verovio) to set what the default @con
is for points (1) and (2). For example, if a different symbol is used other than dash for word connections throughout the music, then it should not be necessary to add @con="x"
on every syllable that needs something other than a dash.
I basically agree with @craigsapp, but still think that @con="d"
sould produce dashes without having to use @wordpos
at all.
(Otherwise it would somehow contradict No 3.)
I also agree with @craigsapp. (I think we will need to think carefully about (4) and the use of options for specifying default attribute values.)
I basically agree with @craigsapp, but still think that
@con="d"
should produce dashes without having to use @wordpos at all. (Otherwise it would somehow contradict No 3.)
I don't see a contradiction with no. 3, because @con
should be viewed of as an attribute of @wordpos
and not of <syl>
:
<syl>
<wordpos wordpos="m" con="d">
<syl>
This will allow omission of @con="d"
when the word position is initial or medial, or @con="s"
when the word position is terminal. It is trivial to calculate a default value for @con
given @wordpos
, but it is not possible to calculate @wordpos
given @con
.
In other words, @con
is a graphical styling for @wordpos
, not for <syl>
, or it should be considered to be that way. The main implication being how lyric text can be extracted from music. If @wordpos
is not encoded, then text extract is either more difficult or prone to errors since the extraction will require inferring rather than being explicit.
Accurate text extraction is necessary for certain applications. Take as an example the Tasso in Music Project website, where text extraction from the lyrical text is used:
(1) As an analysis tool that extracts text from the music for studying repetition of words in the music: http://www.tassomusic.org/lyrics/?id=Trm0047m The text on this page is extracted automatically from the score, see graphical notation on workpage: http://www.tassomusic.org/work/?id=Trm0047m
(2) As a repertory-wide text search that also automatically extracts lyrical text from scores for searching: http://www.tassomusic.org/text-search Search for "rist" for example. The results show all words in the music that contain the characters "rist".
(3) As a text search on the workpages, searching for text in a single work (currently broken).
Identifying words without encoding @wordpos
would be more difficult, particularly the need to implement a dual algorithm based on both @wordpos
and @con
. In certain cases extraction could impossible, such as when separations between syllables are underlines and word extensions are both underlines.
@craigsapp It's ok for you to use @wordpos
if you need to, but why should everyone else?
(3) To override the defaults,
@con
would be explicitly be supplied.
Means @con
would overrule @wordpos
, then why shouldn't it be present in the rendering when there's no @wordpos
? E.g. a single syllable word (with no value for @wordpos
) is followed by dashes in the source. Or simply one doesn't need text extraction and doesn't want to encode the lyrics this way,
I think @craigsapp's understanding is that @con
should be used only for specifying the visual aspect of word position connectors. The only difference with the current behavior of Verovio is that it would not be required and that Verovio should assume a default value.
why shouldn't it be present in the rendering when there's no
@wordpos
Because it is there only to specify the visual aspect of the connector, not its presence. This makes sense.
@craigsapp
It's ok for you to use@wordpos
if you need to, but why should everyone else?
You should read my replies more carefully, as I have been explaining why, summed up by @lpugin
:
Because [
@con
] is there only to specify the visual aspect of the connector, not its presence.
I will point out it is me and all MusicXML users:
http://usermanuals.musicxml.com/MusicXML/MusicXML.htm#EL-MusicXML-syllabic.htm
Lyric hyphenation is indicated by the syllabic type. The single, begin, end, and middle values represent single-syllable words, word-beginning syllables, word-ending syllables, and mid-word syllables, respectively.
Example:
<lyric>
<syllabic>middle</syllabic>
<text>ri</text>
</lyric>
I read it carfully (and even understood it). But I accept that my point of view is obviously not convincing enough.
Still there is a needed value for @wordpos
missing (which MusicXML has), and you have to rewrite the description for @con
, which states:
Describes the symbols typically used to indicate breaks between syllables and their functions. Allowed values are: "s" (Space (word separator).), "d" (Dash (syllable separator).), "u" (Underscore (syllable extension).), "t" (Tilde (syllable elision).), "c" (Circumflex [angled line above] (syllable elision).), "v" (Caron [angled line below] (syllable elision).), "i" (Inverted breve [curved line above] (syllable elision).), "b" (Breve [curved line below] (syllable elision).)
To me it looks like the visual appearance gives the function here.
Surely I miss something, but I don't see the real problem. @wordpos
is linguistic information. Translated to MEI, this is probably logical domain, perhaps analytical. @con
is graphical information / visual domain. Of course there is a relation between these two, because otherwise people would regularly fail to read lyrics. Yes, one can most often derive one type of information from the other. Yes, some ways of data input may only generate one thing. Yes, files are getting bigger by having both. Yet still, these are two different things, and since MEI usually tries to keep such things separate where feasible, I would strongly argue that MEI should not conflate them or make them interdependent. Verovio could of course answer this question differently and opt for default values etc, but I'm also not convinced on that end. Coming back to @rettinghaus's original question, I think it would not be necessary for Verovio to require @wordpos
to render a @con
per se. If, however, Verovio's internal model is supposed to deal with the lyrics as parseable text, it will also require the presence of @wordpos
. But even then, it could perhaps respond to @con
for rendering purposes? I would not expect it to render much anything from just @wordpos
, however…
Ok, thanks. Let me close the issue since there is no clear consensus how this should be changed and since Verovio produces a satisfactory output.
Currently verovio requires a dual-attribute system to display hyphens. It would be useful to increase readability of the MEI data and decrease confusion and encoding time by allowing a non-dual system. I would be happy with the dual system until everyone agrees on the @wordpos
system. 😜
Yes, one can most often derive one type of information from the other.
The problem is that the relationship between @wordpos
and @con
is not symmetric. It is trivial to derive @con
from @wordpos
, but it is not trivial to go from @con
to @wordpos
. The reason is that @con
is a visual attribute and @wordpos
is a logical (semantic) attribute. I would not consider @wordpos
an analytic attribute (and hence non-essential attribute for encoding the score).
When typesetting music, the position of the syllable in the word determines the connector that is needed, and there are very few exceptions in modern typography. In other words, the word position is needed to generate the score rendering, not the other way around: to define the words by how the syllables are visually connected.
When doing OMR it will be necessary to infer @wordpos
from @con
information, but when rendering a score the process should be @wordpos
to @con
. For the most part, @con
need never be encoding except in the case that @rettinghaus gave in a separate thread. In that case, there would be two encoding options: (1) encode the word positions incorrectly, utilizing their default @con
, or preferably (2) encoding the word positions correctly, but override the default @con
to show the typography error in the print. It would be easy to search for such anomalies in a set of scores by looking for cases where @con
is overridden for a type of @wordpos
rather than sorting through a dictionary of English to determine that webe
and seech
are not English words.
Since the logical attribute can be used to calculate the visual, it should have precedence over a visual attribute. Visual attributes lack a clear meaning, so generating words from the score will not be 100% accurate.
I recommend changing @con
to @wordpos.con
to enforce a hierarchy on these two attributes. This would also reduce confusion, particularly when encoding for display with verovio, since several people have posted messages about this (maybe to MEI-L since I cannot find them on github). Using @wordpos
as the primary would also be more convenient for implementing a styling features for @con
.
@con
focuses on the connector after a syllable. There are also cases where there should be an added hyphen before a syllable: when there is a medial syllable at the start of a system (standard in modern typesetting). This is the purpose of the category @wordpos="m"
. If you encode only @con
, then either this case is no longer deterministic, or it becomes more complex to calculate, since you would have to walk back to the previous syllable in the music to decide if a prefix hyphen is needed. And if you are encoding an excerpt of music that starts with a prefix hyphen, then it would not be representable with @con
encoding.
As mentioned previously in this thread, MusicXML uses the @wordpos
system, and I don't see any clear presence of the @con
system (there are line extensions which could also function as @con
). MusicXML is a less semantic representation of music, but it chooses the more semantic system for encoding syllable separation information.
There is also consideration of sharing of data between projects. It is easy to have a single project/edition remain self consistent. But if you allow multiple expressions of equivalent information with a false sense of flexibility, then it becomes more difficult to share data. A practical matter would be that added complexity in the development of software. More complexity in the data format means more development cost for software that processes that data. The @wordpos
and @con
systems mostly overlap (i.e, mostly redundant), so requiring software to handle both is more expensive in terms of development time and maintenance. The @con
system by itself is also more expensive than the @wordpos
system as well as being a subset of the @wordpos
system.
Why does Verovio requires an
@wordpos
to show an@con="d"
? I would think@wordpos="i"
or@wordpos="m"
implies an dash that Verovio should draw. But@con="d"
says explicitly that there should be dash, doesn't it?