marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.25k stars 233 forks source link

Cost is nan When adding guided alignment #417

Open mahmoudaymo opened 1 year ago

mahmoudaymo commented 1 year ago

Bug description

I have trained a model for 5 epochs without guided alignment. Then I trained for 5 epochs more with guided alignment. When training without guided alignment everything went fine. However, when adding the guided alignment (the second 5 epochs) the training cost is nan in every update.

How to reproduce

Describe steps or include command to reproduce the behavior. I have run this script:

`#!/bin/bash

set -e

exp_dir=path_to_experiment_dir

exp=$exp_dir/basemodel config=$exp/config.yml

/marian/build/marian -c $config \ --valid-log $exp/valid.log \ --log $exp/train.log \ --model $exp/model.npz \ --after 5e

exp=$exp_dir/finetuned config=$exp/config.yml # This config is similar to the above except I unset --all-caps-every and --english-title-case-every params

/marian/build/marian -c $config \ --pretrained-model $pretrained_model_path \ --valid-log $exp/valid.log \ --log $exp/train.log \ --model $exp/model.npz \ --after 10e \ --guided-alignment /Engines/MAS/ENUSDEDE/alignment/corpus.align \ --guided-alignment-cost ce` marian.logs.txt

Context

Add any other information about the problem here.

TransperfectAI commented 10 months ago

We are experiencing this issue, too, even when training with alignment from the start. Could it be related to the guided-alignment-cost? We used to use mse and then changed to ce when mse was no longer supported. The issue started after that for us. It also means that to restart training in a directory you need to edit the cost in the model.npz.progress.yml or it throws an error