Morphology support - Githubissues

Right now terminal tokens have to be separate words. Treebender should be able to support morphological rules:

V[ stem: t ] -> walk
V[ stem: t ] -> talk
// stem: f to block walkedededededededed...
V[ tense: past, stem: f ] -> V[ stem: t ] ++ ed  // syntax TBD

Questions:

What scope do we want here? Are we only supporting basic concatenative morphology (prefixes and suffixes), or will we try and support allomorphy, sound changes / ablaut, semitic roots...
- It's tempting to say we just focus on English and support concatenative and allow the user to fall back with a flag:
```
V[ can_inflect: y ] -> walk
V[ can_inflect: n ] -> buy
V[ tense: past, can_inflect: n ] -> V[ can_inflect: y ] ++ ed
V[ tense: past, can_inflect: n ] -> bought
```
- However, lots of common words in English have changes like bake ~ baked not *bakeed. There's no real way to support that without some more sophisticated tool or tons of duplicate rules.

Todo:

Remind myself of how the LKB does this

One way to approach this would actually be to just allow grammar files to define a token-splitting process that runs before parsing.

Something like:

$splitters = [
    /(.+)ed/ => [\1, -ed]
    /(.+)d/  =>  [\1, -ed] // for words like "baked"
    /(.+)s/  => [\1, -s]
    /(.+)es/ => [\1, -s]
]

Then all possible splitters would match on a word, plus an implicit "no expansion" splitter, and split a sentence into a bunch of possible morphological derivations:

"The dogs walked to the beach and baked" "The dogs walk -ed to the beach and baked" "The dogs walke -ed to the beach and baked" "The dog -s walked to the beach and baked" "The dog -s walk -ed to the beach and baked" "The dog -s walke -ed to the beach and baked" "The dogs walked to the beach and bak -ed" "The dogs walk -ed to the beach and bak -ed" "The dogs walke -ed to the beach and bak -ed" "The dog -s walked to the beach and bak -ed" "The dog -s walk -ed to the beach and bak -ed" "The dog -s walke -ed to the beach and bak -ed" "The dogs walked to the beach and bake -ed" "The dogs walk -ed to the beach and bake -ed" "The dogs walke -ed to the beach and bake -ed" "The dog -s walked to the beach and bake -ed" ==> "The dog -s walk -ed to the beach and bake -ed" "The dog -s walke -ed to the beach and bake -ed"

Obviously this has the potential to blow up, but we could also fail fast if a splitter generates a token that doesn't match any nonterminals in the grammar.

vgel / treebender

Morphology support #1