syntax-prosody-ot / main

A webapp for the syntax-prosody analyst working in Optimality Theory, with automated Gen, Con and Eval. Download build files from syntax-prosody-ot/build
https://spot.sites.ucsc.edu/
GNU General Public License v2.0
12 stars 2 forks source link

Accent representation #490

Closed jbellik closed 3 years ago

jbellik commented 3 years ago

The accent attribute is currently referenced in several different places in SPOT:

More references to accent are coming (#489 , #488 ). Before adding these, I want to settle a question of how to represent accent. I think a minor adjustment is in order.

The current implementation is that if a word is accented, it can have accent: 'a' or accent: 'A'. An unaccented word has accent: 'u' or accent: 'U'. But I think it would be clearer and less error-prone to simply label accented words as accent: true, and unaccented words as accent: false.

The reason for the original representation with 'a' and 'u' was to enable us to distinguish between unaccented words in a language that does use accents, such as Japanese, and words in languages that don't use accents at all. I think the current implementation won't add tones at all if accent isn't specified at all -- although I haven't double checked this. But I think it's not actually necessary to distinguish these.

@nkalivoda , do you think it's okay to conflate the representation of unaccented words with that for words for which there is no accent value at all?

nkalivoda commented 3 years ago

Yes, I see no problem with going from binary to privative. But we should still be able to use the "A"/"U" id-system the way we do now, and we need to make sure the tonal interpretation doesn't get messed up.

jbellik commented 3 years ago

We have a function (actually multiple functions) that translate id:'a' and id:'u' to accent: 'a' and accent: 'u' at present. I would just change them to translate them to accent: true and accent: false instead.

jbellik commented 3 years ago

Here's another issue: Right now, if id is not 'a' or 'A' but accent is set to true, then .a is suffixed to the id in the parenthesization. Examples:

The problem is that if someone puts in, say, 'a b c' where a and b are both accented but c is not, then we'll get a mixed representation: (a b.a c)

I don't know if that's worse, or if a redundant representation like (a.a u) is worse.

It would be possible to extend the function so that it pays attention to all the terminals in the string, but that would make it a lot more complex and brittle, and might create other problems.