unfoldingWord / translationCore

Repository for the desktop application translationCore
https://www.translationcore.com
Other
36 stars 11 forks source link

Refine tokenizer for Hindi use-case #5060

Closed benjore closed 5 years ago

benjore commented 6 years ago

Story Explanation

User Story

As a [type of user], I want [some goal] so that [some reason].

Features / Specifications

RoyalSix commented 6 years ago

Tasks

cckozie commented 6 years ago

Per Klappy: Words in the word bank should exactly match what is in the scripture pane.

cckozie commented 6 years ago

The new tokenizer code got merged in earlier (it works properly in 1.0.0 (8adaf14) and then got pulled out in 1.0.1. According the @klappy it has to do with dependencies.

klappy commented 6 years ago

The code in the core no longer holds the helpers/utilities that do the tokenization. That was moved to the word-aligner module which includes the tokenizer. That module needs updated to the appropriate version of the tokenizer, published under its new version, repeat with the wordAlignment tool updated accordingly, then tC.

cckozie commented 6 years ago

Good: image

Bad: image

da1nerd commented 6 years ago

the updated wordmap-lexer and wordmap need to be used in wordAlignment@1.0.1 in order for all of this to work. a few minor structural changes will be need as well. e.g. not just a dependency bump.