Closed rmlockwood closed 3 weeks ago
apertium-transfer
doesn't actually care about the presence of *
, but lt-proc
does and simply escaping doesn't work currently (though I could probably make it work)\/
in the stream, the rules can just refer to a/b
without issue.The annoying part is that they only need to be escaped inside <test>
. In <def-cat>
and <out>
the unescaped versions are fine.
Since we're mangling the files anyway, I suppose I could add a step to escape all the symbols that need escaping in the transfer file before running it. Then the user probably wouldn't need to escape anything at all.
Slash in the biling. lex. is now working, but asterisk doesn't seem to be. I updated the code in three places in the reserved-charactes branch to not change to _. So now if you test with German-Swedish Reserved characters you should get lieb1.1 in the biling. lex. and Apertium isn't translating it to älska1.1.
The bilingual.dix
file in that project hasn't been regenerated and still has _
.
I changed it to use * in the Utils code. It still doesn't work. Please try running the Build Bilingual Lexicon module yourself with the latest code on the reserved-characters branch.
After 43c6f71 it works for me.
Apertium tools not working when another symbol follows a symbol with a slash.
I have the following in my biling. lex.:
<e><p><l>*lobwana1.1<s n="n" /><s n="1/2" /></l><r>*lopwana1.1<s n="n" /><s n="1/2" /></r></p></e>
I have this in my source text:
^\*lobwana1.1<n><1/2><x>$
I get this result (no rules applied):
^*lopwana1.1<n><1<x>$
If the source text doesn't have the <x>
, it works fine.
The problem there is in lt-proc
and there's a fix in https://github.com/apertium/lttoolbox/pull/185
Fixed in PR #726
Currently FLExTrans doesn't handle reserved characters in Apertium very well. Here's two examples:
Other characters may be problematic either in the data stream going into Apertium or in the bilingual lexicon. All of the characters should be identified and the appropriate quoting or converting should be done. Ideally, the user should not have to change how he/she references lemmas or affixes in the rules from what he/she sees in the FLEx lexicon. (Except, of course, the dot to underscore conversion that I don't think we can avoid.)
This work should be done off of a new branch from master.