Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
floor may be greater than n, which makes num_top reach values close to 2^32.
It happens in real life when the following conditions are met:
fast_align is called with the flags --favor_diagonal and --optimize_tension, and
the parallel corpus contains source sentences that are longer than their matching target sentence.
In that situation, fast_align.cc's main sometimes invokes ComputeDLogZ with i > n. ComputeDLogZ then calls ComputeZ with i > m (which triggers an assert error if the asserts are commented out).
In functions
ComputeZ
andComputeDLogZ
, line 33 and 50:floor
may be greater thann
, which makesnum_top
reach values close to 2^32.It happens in real life when the following conditions are met:
--favor_diagonal
and--optimize_tension
, andIn that situation, fast_align.cc's main sometimes invokes
ComputeDLogZ
withi
>n
.ComputeDLogZ
then callsComputeZ
withi
>m
(which triggers an assert error if the asserts are commented out).Note: This is obviously related to those two commits https://github.com/clab/fast_align/commit/5fe669ed08617d54f57577e75944f2e25c68d466 https://github.com/clab/fast_align/commit/adfadde4c129026790224b04a67ba5b8c0c89840 from the clab/fast_align repo, although I am not quite sure why the second reverted the first.