rmlockwood / FLExTrans

Machine Translation using FLEx, Apertium, and STAMP
MIT License
10 stars 2 forks source link

[Build Bilingual Lexicon] Entries with multiple words messed up #739

Closed rmlockwood closed 1 month ago

rmlockwood commented 1 month ago

If I have a two word entry in my source lexicon and link it to a word in the target lexicon, when the bilingual lexicon gets built, the xml is getting messed up. Here's a sample line: <e w="1"><p><l>zu&lt;b/&gt;hause1.1<s n="n" /></l><r>bil1.1<s n="n" /></r></p></e>

I expected it to be: <e w="1"><p><l>zu<b/>hause1.1<s n="n" /></l><r>bil1.1<s n="n" /></r></p></e>

see example files here: https://drive.google.com/drive/folders/1TxMwc9N4MyAO9i9MKuWilvEY40WZ2mHz?usp=drive_link

mr-martian commented 1 month ago

Something went wrong in the merge process for the replacement editor. https://github.com/rmlockwood/FLExTrans/blob/master/ExtractBilingualLexicon.py#L551 calls processSpaces but that function should have been deleted.

rmlockwood commented 1 month ago

fixed in PR #740