Open gustafl opened 8 years ago
There is a third solution too: simply consider each alternative spelling a separate "word". In principle, I think we need to get away from the idea of registering words, compounds and expressions and only work with strings. The goal of using Lexeme is not to build a perfect dictionary.
By considering each alternative spelling it's own word, the inflections will connect the right lexemes too.
At some point, we need to allow registering alternative spellings or pronounciations. At least to lexemes, but inflections can probably have alternative forms too. And if we choose to store compounds and expressions separately, they should probably support alternative forms also.
Do we need an order among the alternatives? Is one spelling or pronounciation always more important? To my understanding, the answer is no. Take British and American English for example. A user learning English would want to register color and colour as two lemmas of equal value representing the same lexeme.
This tells us we should probably move the
spelling
andpronounciation
fields out of thelexeme
table. The remaining fields would beid
,language
andlexical_category
, which means thelexeme
table would carry a purely abstract representation of the lexeme. Thespelling
andpronounciation
fields could go into a table namedlemma
, which could hold multiple (unordered) lemmas for the same lexeme.Another way of modelling this would be to leave the
spelling
andpronounciation
fields in thelexeme
table and add alexeme_variant
table. The columns in this table would beid
lexeme
spelling
pronounciation
The problem with this design is twofold:
After thinking this through, I now believe that the design in which the
spelling
andpronounciation
fields leave thelexeme
table is the better design. And if we need to support alternative forms to inflections, compounds and expressions too, I think the same design could work there as well.So much for data storage part of the design. Next, I need to think about how to create, retrieve, update and delete alternative spellings and pronounciations in the GUI. For now, supporting alternative forms is considered a post 0.1.0, or even post 0.2.0 feature. But it's still good to know how it will affect the data model when it's time to implement it.