Design how conflicts between homographs will be resolved when auto-highlighting words

Auto-highlighting will occur in two cases:

When a lexeme or inflection is saved for the first time.
When a document is loaded.

In the first case, the auto-highlight method loops though every word in the document and highlights all matches to the newly saved word (skipping existing highlights). In the second case, the method makes a list of all unique words (based on spelling) in the document, and matches that list to the list of lexemes and inflections (including alternative forms, as described in #114) in local storage, and auto-highlights all matches.

It is unavoidable that some words will be highlighted using the wrong lexeme or translation data, due to homographs etc. Design how to deal with such conflicts.

Resolving conflicts must easy and fast. 2 clicks maximum. Both the conflicts and the way of resolving them should be visible in the article, not in the form.

The best option is probably to mark words where conflicts occur in some way (perhaps with a border rather than a background color, or maybe an animated background color). This mark would mean that Lexeme was not able to highlight the word, because there were more than one match (among the total lexemes and inflections). Hovering over the conflict-marked word would bring up a menu below (or above, depending on viewport position) with one row per match. Each row has the lexeme, the lexical category and the translations Clicking one of the matches highlights the word and resolves the conflict.

These conflict resolutions need to be stored somehow, because you don't want to redo them every time you open the same document.

Also, there is already a plan to have a hover menu for finding the meaning of highlighted words. Make sure this solution harmonize with that. The hover-menu with multiple choices could remain even after a choice has been made (just mark the user's choice).

My thinking on this has changed slightly. I no longer think that we should prompt the user to resolve every disambiguity. And more importantly, we don't need to save resolved disambiguities. Rather, one should adopt the mindset that disambiguities are expected and don't need to be resolved. There's no point in resolving disambiguities you already know about, just to clean the document from "errors". Protect the user from wasting time.

Of course disambigious words must be marked in some way to separate them from completely unknown (unregistered) words. But don't do it conspiciously. Do it discretely. For instance, do it in a low-contrast, neutral background color. Simply show that it's a known word. Avoid giving the impression that there is something wrong with the disambiguity that needs to be handled.

Read issue #115 for more on my current thinking on how to handle disambiguities.

One question remains. When the user selects a match in the tooltip, or scrolls through the matches using the arrow keys, should that be reflected in the highlight? This still needs some consideration. On one hand, it would be nice if it did. On the other, users would probably expect their selections to be preserved next time they open the same document. And I don't think we should save anything about the document itself.

My thinking on ambiguity has changed again. It now seems clear that we need to abandon our original vision of completely disassociating the document from the application. The problem of ambiguity is unsolvable in the context of this application, unless we persist the sense of ambigious words.

Why is it unsolvable? Because in Lexeme, the problem of identifying and resolving ambiguities is partly a user task (as a part of learning the language) and partly an application task (when restoring highlights already made). Unless the user-generated markup is persisted somewhere, the application cannot restore the markup when a document is reloaded. Not even an application with perfect understanding of the language would be able to do so. Auto-highlighting makes no sense unless the user-generated markup is not replicated exactly (even when it's wrong).

The next best solution appears to be to store the full document, including user-generated markup, in local storage. This is of course the opposite of completely disassociating the document from the application, but I see no better way. This solves all the problems with ambiguity and also lets the document load without having to auto-highlight words (a potentially slow process when most of the words are known). Auto-highlighting will only be necessary when loading a new document.

We need a new data-attribute to store the unique sense/meaning of words in the <span> elements. In principle, we could limit this attribute to ambigious words, but I believe it's useful to have it on every known word. I outlined how this could work in issue #90. Here's an example:

<span class="noun" data-key="fr.lexeme.baleine.01">baleines</span>

The language code is taken from the lang attribute on the article element. The unit of learning (e.g. lexeme) is taken from the planned unit of learning choice (this will probably be a set of buttons right below the current header of the form. The word will be taken from the #word input. And finally, the serial number will be generated at the time of saving, taking previous homographs into account.

gustafl / lexeme

Design how conflicts between homographs will be resolved when auto-highlighting words #101