rmlockwood / FLExTrans

Machine Translation using FLEx, Apertium, and STAMP
MIT License
10 stars 2 forks source link

[Synthesize with STAMP] Cross reference double-digit numbers get an 'a' inserted #530

Closed rmlockwood closed 8 months ago

rmlockwood commented 9 months ago

In one text, the cross-reference: .\x * \xo 8.5 \xt 2Sam. 7.14; Sal. 89.32; Prov. 3.11-12; Heb. 12.5-11

turned into: \x * \xo 8.5 \xt 2Sam. 7.1a4; Sal. 89.32; Prov. 3.1a1-12; Heb. 1a2.5-11

rmlockwood commented 9 months ago

The real issue here was that the user allowed parsing of digits and in so doing, his setup mapped the lexeme 1 to 1a. But this uncovered a problem. The recommended way to handle numerals is to treat them as the analysis writing system. But in this project, where the setting Cleanup Synthesis was turned on (to remove N.N from unsynthesized words), the chapter verse references like 89.32 were getting deleted when they were treated as the analysis writing system. The code fix https://github.com/rmlockwood/FLExTrans/commit/39d6eaf6bba903766d0c7fb1566e6c5a4cfbc713 corrects this.