Word accent GEN - Githubissues

jbellik commented 3 years ago

Ito & Mester have a paper that explains accent placement (as well as prosodic category setting) in Japanese compound and simplex prosodic words using OT constraints: WordAccent, WordMaxAccent, BinMin(w), BinMaxHead(w), BinMax(phi.min).

To implement this analysis in SPOT, we would need GEN to create structures that place accent in every possible location (according to some algorithmic definition of "possible"...).

Proposal: Write a function that takes a prosodic tree (which will have been generated by the existing GEN function) and returns an array of prosodic trees in which accent has been marked on each possible combination of minimal prosodic words.

For example: Given ptree = {id:'a', cat: 'w', children: [{id: 'b', cat: 'w'}, {id: 'c', cat: 'w'}]}, then: genAccents(ptree) = [ {id:'a', cat: 'w', children: [{id: 'b', cat: 'w'}, {id: 'c', cat: 'w'}]}, {id:'a', cat: 'w', children: [{id: 'b', cat: 'w', accent:'a'}, {id: 'c', cat: 'w'}]}, {id:'a', cat: 'w', children: [{id: 'b', cat: 'w'}, {id: 'c', cat: 'w', accent: 'a'}]}, {id:'a', cat: 'w', children: [{id: 'b', cat: 'w', accent: 'a'}, {id: 'c', cat: 'w', accent: 'a'}]} ]

That is, for each prosodic tree p, containing n minimal prosodic words, genAccents(p) will return 2^n new prosodic trees with accents marked (since there are two possible accent values for each minimal prosodic word, 'a' / 'u' [1]).

Then we would add an option addOutputAccents to GEN (probably to the wrapper function in candidate_generator_wrapper.js) that, if true, would run genAccent on each of the prosodic trees in the output of GEN_impl().

[1] I am considering making a change to the current system of accent representation so that accent is a boolean like the other optional attributes. I will link that issue here once I create it.

jbellik commented 3 years ago

May be affected by #490

jbellik commented 3 years ago

This should also work for Swedish phrasal accent placement -- although Myrberg treats AccentAsHead as unviolated, so this will certainly a superset of the needed candidates. This may actually be the case in Ito & Mester as well.

jbellik commented 3 years ago

The problem would be the existence of too many candidates since the relevant data is sentences of 10-12 words. However, some of these are function words that are effectively clitics and therefore these could be represented as ~7 prosodic words.

jbellik commented 3 years ago

Alternate implementation: a function that takes a ptree (from the output of GEN) and returns a list of all the ways to put one accent in each minimal phi (or minimal word, depending on the specifications)

f(a b c) = {(a b c), (a b c), (a b c*)}

But (a b c) etc would be excluded from consideration.

jbellik commented 3 years ago

The Japanese analysis needs to include unaccented minimal words (violations of WordAccent) but the Swedish one does not (and should not) include minimal phis that lack accents altogether (violations of headedness for the phi?)

jbellik commented 3 years ago

Ideal scenario: Make use of genHeadsForTree(ptree, 'w'), defined in main/generateHeaded.js. See #514 for info on this.

Add an option to allow non-edge-aligned heads. I.e., (a b c) and (a b c), we also want (a b* c)
Also need an option to allow headless nodes. These are separate features.

Something's not right in file:///Users/jenny/Documents/GitHub/main/test/addHeadsToListTest.html?

syntax-prosody-ot / main

Word accent GEN #489