As in the script ./preprocess/run.sh, we set spm option --add_dummy_prefix 1(also default in spm_train), which makes the resulting subword corpus contain '▁' at the beginning of each sentence, makes converted word alignment count from 1, which breaks the AER calculate for subword part. To fix this issue, we can simply subtract 1 to counteract this effect.
As in the script ./preprocess/run.sh, we set spm option --add_dummy_prefix 1(also default in spm_train), which makes the resulting subword corpus contain '▁' at the beginning of each sentence, makes converted word alignment count from 1, which breaks the AER calculate for subword part. To fix this issue, we can simply subtract 1 to counteract this effect.