rwth-i6 / returnn-experiments

experiments with RETURNN
154 stars 44 forks source link

Questions on librispeech transformer lm #71

Closed wenjie-p closed 3 years ago

wenjie-p commented 3 years ago

Hi,

Thanks for this great toolkit you have built! I have been recently conducting exp on the librispeech transformer lm training, but found it hard to converge due to the dramatically increased training tokens and model #params. Fortunately you have released the pretrained word-level transformer lm here, but these models seems to be trained using tensorflow. I am wondering whether the corresponding pytorch-trained lm is avaiable?

Thanks in advance!

christophmluscher commented 3 years ago

Hi,

we do not have a corresponding pytorch-trained LM.

Best

albertz commented 3 years ago

But certainly you could write yourself a converter script.

wenjie-p commented 3 years ago

Yes of course! Thanks so much!

wenjie-p commented 3 years ago

Hi,

I am wondering how you train the transformer lm with 42 hidden layers. I am following the pytorch examples, but found it took too long to train the 42-layer transformer lm.

Thanks in advance.

wenjie-p commented 3 years ago

Hi, I think the key is the residual connection. Just want to conform it.

ringoreality commented 3 years ago

Hello Wenjie,

The residual connection helps to stabilize the convergence when the model becomes deep. If the long training time is of your concern, please notice that we use sampling-based softmax to speed up the training.

For your information, depending on the vocabulary size, the number of samples per position (normally samples are shared across a batch), the noise distribution, a relative training speedup of 100% is often achievable without a significant loss in PPL.

For further reading, I suggest you check:

Kind regards, Ringo

wenjie-p commented 3 years ago

Thanks!

wenjie-p commented 3 years ago

Hi,

I am trying to rescore the lattice results with transformer lm. Since the decoding part of my system is not based on returnn, thus I only want to use the config under 2019-lm-transformers/librispeech/word_200k_vocab with the pretrained model to obtain the hypothesis scores of the lattice.

I set task='eval', but found the following log

2021-06-15 14:32:17.473230: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-15 14:32:17.938904: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.ops.tensor_array_ops.TensorArray'>):
<tf.TensorArray 'output/rec/subnet_base/source_tensor_array/source_ta'>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "/mnt/workspace/pengwenjie/returnn/returnn/tf/network.py", line 999, in _create_layer
    return layer  File "/mnt/workspace/pengwenjie/returnn/returnn/tf/layers/rec.py", line 255, in __init__
    self.saveable_param_replace.update(self.cell.output_layers_net.get_saveable_param_replace_dict())  File "/mnt/workspace/pengwenjie/returnn/returnn/tf/layers/rec.py", line 884, in _get_output_subnet_unit
    return output  File "/mnt/workspace/pengwenjie/returnn/returnn/tf/layers/rec.py", line 2529, in get_output
    return output  File "/mnt/workspace/pengwenjie/tf-gpu/lib/python3.8/site-packages/tensorflow/python/util/tf_should_use.py", line 247, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs),
==================================
ERROR:tensorflow:==================================

But the program didn't get interrupted. I have the following quesitons:

1) what is the error log and why it occurs? 2) How should I get the output results with the format: hypothesis hypothesis-score 3) any references for me to set the config properly to get me there?

Thanks!

albertz commented 3 years ago

I am trying to rescore the lattice results with transformer lm. ...

I set task='eval'

Task eval is to get the score for given individual sequences (but not a lattice). You are aware of that?

Lattice rescoring would be more involved. You find some example code here. Otherwise our RASR toolkit also can do lattice rescoring.

  1. what is the error log and why it occurs?

It's only warning. If everything runs fine otherwise, I would not care about it. Although it's a bit strange. I don't exactly know why it occurs.

  1. How should I get the output results with the format: hypothesis hypothesis-score

Please refer to the documentation, and the code, esp execute_main_task. You find:

    engine.eval_model(
      output_file=config.value("eval_output_file", None),
      output_per_seq_file=config.value("eval_output_file_per_seq", None),
      loss_name=config.value("loss_name", None),
      output_per_seq_format=config.list("output_per_seq_format", ["score"]),
      output_per_seq_file_format=config.value("output_per_seq_file_format", "txt"),
      lr_control_update_scores=lr_control_update_scores)

And then follow the code to see more details. So for you, there are the relevant options eval_output_file_per_seq, output_per_seq_format and output_per_seq_file_format.

I would suggest:

output_per_seq_file_format = "py"
wenjie-p commented 3 years ago

Task eval is to get the score for given individual sequences (but not a lattice). You are aware of that? Lattice rescoring would be more involved. You find some example code here. Otherwise our RASR toolkit also can do lattice rescoring.

Thank you for your kindly remind. Yes, the input for the nnlm is plain text rather than a lattice. RASR is a great ASR toolkit, but currently I only need the transformer lm to give me the scores. Will investigate the relevant toolkits in the future.

Again, thanks for making the word-level transformer lm publicly available. Without it, I believe it will take me some time to train and fine-tune.