Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
I am not sure if this problem existed in earlier versions, but at least for the current version there is a problem when translation input contains non-terminal-like symbols, such as "[blabla]" and constructing a pass through grammar. This can happen for example if the tokenizer for foreign languages does for some reason not segment brackets (in my case Japanese).
Here is an example:
echo "[blabla]" | cdec -c cdec.ini
Configured 1 rescoring pass
[num_fn=1 int_alg=FULL]
Adding glue grammar for default nonterminal X and goal nonterminal S
Reading input from STDIN
INPUT: [blabla]
id = 0
Adding pass through grammar
Grammar [X] ||| [blabla] ||| [blabla] ||| PassThrough=1
line 1: LHS and RHS arity mismatch!
Aborted
Hi,
I am not sure if this problem existed in earlier versions, but at least for the current version there is a problem when translation input contains non-terminal-like symbols, such as "[blabla]" and constructing a pass through grammar. This can happen for example if the tokenizer for foreign languages does for some reason not segment brackets (in my case Japanese). Here is an example:
Inputs like "[ blabla]" are fine.