gustafl / lexeme

A new take on language learning.
1 stars 0 forks source link

Design a solution for alternative spellings and pronounciations #114

Open gustafl opened 8 years ago

gustafl commented 8 years ago

At some point, we need to allow registering alternative spellings or pronounciations. At least to lexemes, but inflections can probably have alternative forms too. And if we choose to store compounds and expressions separately, they should probably support alternative forms also.

Do we need an order among the alternatives? Is one spelling or pronounciation always more important? To my understanding, the answer is no. Take British and American English for example. A user learning English would want to register color and colour as two lemmas of equal value representing the same lexeme.

This tells us we should probably move the spelling and pronounciation fields out of the lexeme table. The remaining fields would be id, language and lexical_category, which means the lexeme table would carry a purely abstract representation of the lexeme. The spelling and pronounciation fields could go into a table named lemma, which could hold multiple (unordered) lemmas for the same lexeme.

Another way of modelling this would be to leave the spelling and pronounciation fields in the lexeme table and add a lexeme_variant table. The columns in this table would be

id lexeme spelling pronounciation

The problem with this design is twofold:

After thinking this through, I now believe that the design in which the spelling and pronounciation fields leave the lexeme table is the better design. And if we need to support alternative forms to inflections, compounds and expressions too, I think the same design could work there as well.

So much for data storage part of the design. Next, I need to think about how to create, retrieve, update and delete alternative spellings and pronounciations in the GUI. For now, supporting alternative forms is considered a post 0.1.0, or even post 0.2.0 feature. But it's still good to know how it will affect the data model when it's time to implement it.

gustafl commented 8 years ago

There is a third solution too: simply consider each alternative spelling a separate "word". In principle, I think we need to get away from the idea of registering words, compounds and expressions and only work with strings. The goal of using Lexeme is not to build a perfect dictionary.

By considering each alternative spelling it's own word, the inflections will connect the right lexemes too.