Refine TM results - Githubissues

sujato commented 4 years ago

It quite frequently happens that the TM results show a bunch of segments featuring words like tathagata or dhammavinaya, and so on. Whereas in practice these ones are well known to the translator, and they are looking for the more unusual terms. Obviously if the phrase is a match this is not an issue, but perhaps we could add a weighting that would push up singletons or rare matches in the TM results.
Small words and particles generally need not be matches by themselves, perhaps filter out words of three or fewer letters. Of course if they are part of a bigger phrase they can be significant, just not if they are by themselves.
Also weight for diversity. Currently I have the phrase Mārañjahaṃ brūmi jarāya pāraguṃ, and the TM shows me a series of hits for jarāya, nothing for the other words.

Sometimes TM omits one's own translations completely.

Screenshot from 2020-09-24 09-19-28

This is a serious bug, as it undermines the confidence in relying on TM at all. Together with #67 (no indication of source), it means that, unless I clearly recognize that it is my translation, I have to search each segment manually, lest I unwittingly mix up Brahmali's translation in my own.

TM should show one's own translations at the top.

blake-sc commented 4 years ago

I'm applying some thought to this issue.

One thought, on the primary issue of refinement of results, is the difficulty of reading the translators mind and producing generally diverse terms is a very tricky problem, especially for longer strings: if there are 20 words, and room for 5 suggestions, then yeah, it's going to be tricky being comprehensive. It seems to me that user-directed refinement could be very useful, what I imagine is that highlighting (selecting) a term (i.e. by double-clicking it) in the root string re-runs the TM query with it "focused" on that term or perhaps phrase, bringing up relevant strings. This should be rather simple to implement, certainly a couple of orders of magnitude easier than trying to make an intelligent guess about what the translator cares about.

sujato commented 4 years ago

Sure, it's not easy.

But currently, the final point, omitting ones' own translations, is by far the biggest headache. Let's fix that first.

blake-sc commented 4 years ago

Yeah, Arango 3.7 happens to offer a much cleaner way of doing that filtering, so it'll be part of the upgrade to 3.7.

blake-sc commented 4 years ago

In general, it would seem that prioritizing one's own translations would basically mean other's translations wouldn't appear at all, given a permissive approach to relevance.

I wonder if it would be useful to only show TM results for one's own translations, but have a tabbed view on the suggestions, the other tab would be other's translations. If it's also lazy-loaded (the request is only made when the user clicks on the tab) that would speed up TM since it's faster to only consider a subset of translations.

This essentially is a question of how useful other's translations are, and the best way of making it possible to see them. Would it be acceptable if the interim solution is to only show one's own translations and we consider something like a tabbed view later?

sujato commented 4 years ago

Yes, that would be fine.

Optionally showing other translations can be useful, but more in a "check what they're doing" kind of way, rather than the much more mechanical "that's what that is, bleep, accept, move on" which TM suggestions often can be.

The thing is, when it comes to more nuanced checking, that's when you need to use search, and in that case you definitely want to be able to see the translations of others.

suttacentral / bilara

Refine TM results #51