Refine tokenizer for Hindi use-case

unfoldingWord / translationCore

Repository for the desktop application translationCore

https://www.translationcore.com

Other

36 stars 11 forks source link

Refine tokenizer for Hindi use-case #5060

Closed benjore closed 5 years ago

benjore commented 6 years ago

Story Explanation

User Story

As a [type of user], I want [some goal] so that [some reason].

Features / Specifications

[ ] Increment NPM package version for tokenizer to latest.
[ ] NOTE: Create a branch off of the 1.0 release and include this issue in there.

RoyalSix commented 6 years ago

Tasks

[ ] [2]Make PR for correct string-punctation-tokenizer module version

cckozie commented 6 years ago

Per Klappy: Words in the word bank should exactly match what is in the scripture pane.

cckozie commented 6 years ago

The new tokenizer code got merged in earlier (it works properly in 1.0.0 (8adaf14) and then got pulled out in 1.0.1. According the @klappy it has to do with dependencies.

klappy commented 6 years ago

The code in the core no longer holds the helpers/utilities that do the tokenization. That was moved to the word-aligner module which includes the tokenizer. That module needs updated to the appropriate version of the tokenizer, published under its new version, repeat with the wordAlignment tool updated accordingly, then tC.

cckozie commented 6 years ago

Good:

Bad:

da1nerd commented 6 years ago

the updated wordmap-lexer and wordmap need to be used in wordAlignment@1.0.1 in order for all of this to work. a few minor structural changes will be need as well. e.g. not just a dependency bump.