redpony / cdec

Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
http://cdec-decoder.org/
Apache License 2.0
183 stars 77 forks source link

pass through rule parsing error #60

Open fhieber opened 9 years ago

fhieber commented 9 years ago

Hi,

I am not sure if this problem existed in earlier versions, but at least for the current version there is a problem when translation input contains non-terminal-like symbols, such as "[blabla]" and constructing a pass through grammar. This can happen for example if the tokenizer for foreign languages does for some reason not segment brackets (in my case Japanese). Here is an example:

echo "[blabla]" | cdec -c cdec.ini Configured 1 rescoring pass [num_fn=1 int_alg=FULL] Adding glue grammar for default nonterminal X and goal nonterminal S Reading input from STDIN INPUT: [blabla] id = 0 Adding pass through grammar Grammar [X] ||| [blabla] ||| [blabla] ||| PassThrough=1 line 1: LHS and RHS arity mismatch! Aborted

Inputs like "[ blabla]" are fine.