Accent representation - Githubissues

jbellik commented 3 years ago

The accent attribute is currently referenced in several different places in SPOT:

addJapaneseTones() in annotate_tones.js
accentAsHead() and noLapseL in japaneseAccent.js
in window.GEN(), on line 121 of candidategenerator_wrapper.js <-- I think this particular reference is not working correctly, since it current refers to "accented" rather than "accent"; needs to be checked and corrected
addAttributeLabels() in treeFormatting.js

More references to accent are coming (#489 , #488 ). Before adding these, I want to settle a question of how to represent accent. I think a minor adjustment is in order.

The current implementation is that if a word is accented, it can have accent: 'a' or accent: 'A'. An unaccented word has accent: 'u' or accent: 'U'. But I think it would be clearer and less error-prone to simply label accented words as accent: true, and unaccented words as accent: false.

The reason for the original representation with 'a' and 'u' was to enable us to distinguish between unaccented words in a language that does use accents, such as Japanese, and words in languages that don't use accents at all. I think the current implementation won't add tones at all if accent isn't specified at all -- although I haven't double checked this. But I think it's not actually necessary to distinguish these.

@nkalivoda , do you think it's okay to conflate the representation of unaccented words with that for words for which there is no accent value at all?

nkalivoda commented 3 years ago

Yes, I see no problem with going from binary to privative. But we should still be able to use the "A"/"U" id-system the way we do now, and we need to make sure the tonal interpretation doesn't get messed up.

jbellik commented 3 years ago

We have a function (actually multiple functions) that translate id:'a' and id:'u' to accent: 'a' and accent: 'u' at present. I would just change them to translate them to accent: true and accent: false instead.

jbellik commented 3 years ago

Here's another issue: Right now, if id is not 'a' or 'A' but accent is set to true, then .a is suffixed to the id in the parenthesization. Examples:

(x.a y.a z) would mean x and y are accented but z is not.
(a u) doesn't tell us directly that a is accented and u is unaccented, but this information can be inferred from the ids

The problem is that if someone puts in, say, 'a b c' where a and b are both accented but c is not, then we'll get a mixed representation: (a b.a c)

I don't know if that's worse, or if a redundant representation like (a.a u) is worse.

It would be possible to extend the function so that it pays attention to all the terminals in the string, but that would make it a lot more complex and brittle, and might create other problems.

syntax-prosody-ot / main

Accent representation #490