unfoldingWord / translationCore

Repository for the desktop application translationCore
https://www.translationcore.com
Other
36 stars 11 forks source link

Number Tokenization issue: Numbers with commas are split into separate tokens #5784

Open benjore opened 5 years ago

benjore commented 5 years ago

Story Explanation

User Story

As an aligner, I want numbers separated by punctuation (commas or periods) to be tokenized as one word so that I don't have the option of aligning it incorrectly.

Examples

From https://git.door43.org/lrsallee/en_ult_rev_book

image.png

image.png

Features / Specifications

Definition of Done

Additional Context

Mockups

benjore commented 5 years ago

Unless English-only solution, it will need a spike

cckozie commented 5 years ago

Still an issue in 2.0.0 (4b3ebee)

cckozie commented 4 years ago

Still an issue in 2.2.0 (2375ae5)