kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.12k stars 5.31k forks source link

multi-lang training scripts for nnet3 #661

Closed vijayaditya closed 8 years ago

vijayaditya commented 8 years ago

We need a new training script, similar to steps/nnet3/chain/train.py for training with data from multiple languages. The changes required are as follows:

  1. Generate chain egs for multiple languages (See line of code)
  2. Write new archive index computation logic (See line of code). When the number of training jobs is less than the number of languages we would have cycle through the ark files of different languages in a round robin fashion. Appropriate decisions should also be made for cases where there are more training jobs than number of languages and when there is different amount of training data for each language. @jtrmal who has worked extensively on the nnet2 multi-lang recipe might be the right person to make the decisions about this logic.
  3. Create a new version of steps/nnet3/tdnn/make_configs.py to make a neural network with multiple output layers one for each language. This script can get really complicated as there will be final-xent-affine and final-chain-affine layers for each language. I would recommend reducing the number of options supported drastically to make a simple and understandable script.
  4. Add an option to nnet3-am-copy where you could modify the name of a specific output node. This would be used by steps/nnet3/chain/train_multilang.py or the multi-lang decode script to choose the language specific output layer.
david-ryan-snyder commented 8 years ago

Has anyone done item 4 yet? If so, I can do it.

vijayaditya commented 8 years ago

@david-ryan-snyder not yet, please go ahead.

ghost commented 8 years ago

I am interested in this task. I will work on it

vijayaditya commented 8 years ago

@snidada1 It would be very convenient if you keep updating this issue with the progress reports rather than just sending @jtrmal and me the mails.

vijayaditya commented 8 years ago

We once again need help with this issue, please let us know if you are interested.

amiasato-zz commented 8 years ago

I've conducted some experiments with multiple languages using chain recipes, and I'd like to point a change I had to conduct in the binaries:

nnet-chain-training: The denominator graph constructor uses the 'output' layer dimension as default. The quickest way to deal with it was defining default arguments through the call stack up to the binary opts, and let the user change the 'output' layer by a custom output for a specific language. This shouldn't impact any other experiments, but oh it looks ugly. Is there any other way to get the graph dimension without checking a specific output?

The experiments I made were all done through manual configuration of nnet3 configs. Since I was training on a single GPU instance, I made alternating calls to the TrainOneIteration function for each language (English and Portuguese), each with its proper arguments.

vijayaditya commented 8 years ago

If you are computing the cost function using a particular output through out the training iteration then you could just the rename the output of interest as "output". You can do this by adding a binary which does the renaming when the model is passed to the chain trainer.

You might get better performance if you switch between languages more frequently e.g. at minibatch level. @pegahgh might be able to give you more information about this. However this would require a lot many more changes to the C++ code.

danpovey commented 8 years ago

I think Pegah [her handle is @pegahgh] has been working on multi-language nnet3 training, but not chain training so far, as far as I know.

I believe the approach we decided on is to mix the languages together in the archives of egs on disk, but to have the nnet3-merge-egs program batch them up in such a way that it spits out batches containing only one language at a time. That means there are only a few computations to compile, and we don't spend half the time in compilation.

It looks like the core code in chain training doesn't assume that the output name is 'output', but does assume that the output name and its corresponding 'xent' output differ by an '-xent' suffix:

nnet-chain-training.cc: std::string xent_name = sup.name + "-xent"; // typically "output-xent".

So for chain training, it's probably only the egs generation that would have to be changed.

In general, I think I recommend changing the names of the output nodes to 'output' only for decoding. For training, it's probably better to modify the egs generation to use different output names per language, and this may require adding options to binaries. I hope that at some point we can work with Pegah to get this stuff checked in.

On Wed, Aug 17, 2016 at 1:20 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

If you are computing cost function using a particular output through out the training iteration then you could just the rename the output of interest as "output". You can do this by adding a binary which does the renaming when the model is passed to the chain trainer.

You might get better performance if you switch between languages more frequently e.g. at minibatch level. @pegah https://github.com/pegah might be able to give you more information about this. However this would require a lot many more changes to the C++ code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/661#issuecomment-240535015, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu88ZreDQVHjzSXay1iXyMbLpKJQMks5qg20WgaJpZM4IAgA_ .

danpovey commented 8 years ago

Oh, I see now the specific line of code you were asking about:

NnetChainTrainer::NnetChainTrainer(const NnetChainTrainingOptions &opts,
                                   const fst::StdVectorFst &den_fst,
                                   Nnet *nnet):
    opts_(opts),
    den_graph_(den_fst, nnet->OutputDim("output")), // here
    nnet_(nnet),

Probably to implement the multi-language chain training in the same way as I mentioned above for nnet3 training, we'd need to have multiple of these NnetChainTrainer objects, one per language, and pass the output name in somehow. Perhaps the NnetChainTrainer objects could be located in a map indexed by output name, and only actually initialized once we read in a minibatch [so we know the output name.] We can worry about this when we actually check in scripts and code for the rest of this, though.

Dan

On Wed, Aug 17, 2016 at 1:26 PM, Daniel Povey dpovey@gmail.com wrote:

I think Pegah [her handle is @pegahgh] has been working on multi-language nnet3 training, but not chain training so far, as far as I know.

I believe the approach we decided on is to mix the languages together in the archives of egs on disk, but to have the nnet3-merge-egs program batch them up in such a way that it spits out batches containing only one language at a time. That means there are only a few computations to compile, and we don't spend half the time in compilation.

It looks like the core code in chain training doesn't assume that the output name is 'output', but does assume that the output name and its corresponding 'xent' output differ by an '-xent' suffix:

nnet-chain-training.cc: std::string xent_name = sup.name + "-xent"; // typically "output-xent".

So for chain training, it's probably only the egs generation that would have to be changed.

In general, I think I recommend changing the names of the output nodes to 'output' only for decoding. For training, it's probably better to modify the egs generation to use different output names per language, and this may require adding options to binaries. I hope that at some point we can work with Pegah to get this stuff checked in.

On Wed, Aug 17, 2016 at 1:20 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

If you are computing cost function using a particular output through out the training iteration then you could just the rename the output of interest as "output". You can do this by adding a binary which does the renaming when the model is passed to the chain trainer.

You might get better performance if you switch between languages more frequently e.g. at minibatch level. @pegah https://github.com/pegah might be able to give you more information about this. However this would require a lot many more changes to the C++ code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/661#issuecomment-240535015, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu88ZreDQVHjzSXay1iXyMbLpKJQMks5qg20WgaJpZM4IAgA_ .

vijayaditya commented 8 years ago

being addressed in #1027