tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.52k stars 3.5k forks source link

Problem training Transformer_moe #1115

Open Esaada opened 6 years ago

Esaada commented 6 years ago

Description

following the instruction and got this error: AttributeError: 'HParams' object has no attribute 'layer_types'

Environment information

OS: Linux Ubuntu 16.04
tensor2tensor==1.8.0
tensorboard==1.10.0
tensorflow==1.10.0
tensorflow-gpu==1.0.1
tensorpack==0.3.0

$ python -V
Python 2.7.12

Steps to reproduce:

I used those parameters and actions: PROBLEM=librispeech MODEL=transformer_moe HPARAMS=transformer_base_single_gpu DATA_DIR=./t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

t2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --problem=$PROBLEM

In the end, I used the "train" command: t2t-trainer \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR

Error logs:

WARNING:tensorflow:Shapes are not fully defined. Assuming batch_size means tokens. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Unsetting shared_embedding_and_softmax_weights. INFO:tensorflow:Setting T2TModel mode to 'train' INFO:tensorflow:Using variable initializer: uniform_unit_scaling INFO:tensorflow:Transforming feature 'inputs' with speech_recognition_modality.bottom INFO:tensorflow:Transforming 'targets' with symbol_modality_256_512.targets_bottom WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py:986: calling create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version. Instructions for updating: Shapes are always computed; don't use the compute_shapes as it has no effect. Traceback (most recent call last): File "/usr/local/bin/t2t-trainer", line 32, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/usr/local/bin/t2t-trainer", line 28, in main t2t_trainer.main(argv) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 385, in main execute_schedule(exp) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 326, in execute_schedule getattr(exp, FLAGS.schedule)() File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py", line 331, in continuous_train_and_eval self._eval_spec) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 451, in train_and_evaluate return executor.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 590, in run return self.run_local() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 691, in run_local saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 1184, in wrapping_model_fn decode_hparams=decode_hparams) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 1236, in estimator_model_fn logits, losses_dict = model(features) # pylint: disable=not-callable File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 362, in call outputs = super(Layer, self).call(inputs, *args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call outputs = self.call(inputs, args, kwargs) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 190, in call sharded_logits, losses = self.model_fn_sharded(sharded_features) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 216, in model_fn_sharded self._to_single_features_dict(transformed_features)) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/research/transformer_moe.py", line 103, in body_sharded encoder_layers, decoder_layers = self._extract_layer_types() File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/research/transformer_moe.py", line 222, in _extract_layer_types layer_types = hparams.layer_types AttributeError: 'HParams' object has no attribute 'layer_types'

Thanks!

twilightdema commented 6 years ago

I think transformer_moe use quite different code base from transformer model. If you use hyper-parameters from transformer model code base, it will not contain some mandatory hyper-parameters needed in order to run transformer_moe.

As I read in source code, you will need to at least add some (unused) hyper-parameters like this:

  hparams = transformer.transformer_base_single_gpu()

  # Params below are required in order to have transformer_moe perform the same way as transformer
  hparams.layer_types = "a/a/a/a/a#a/a/a/a/a"
  hparams.default_att = "a"
  hparams.default_ff = "fc"

  # Params below may not be used, but need to be exist
  hparams.attention_loc_block_length = 256
  hparams.attention_loc_block_width = 128
  hparams.attention_red_factor = 3
  hparams.attention_red_type = "conv"
  hparams.attention_red_nonlinearity = "none"

Anyway, if you mean to use transformer_moe, then you probably should use hyper-parameters from transformer_moe, such as: transformer_moe_2k

Roshanson commented 3 years ago

It seems that the fc layer of the moe type has not been implemented when i use hyper-parameters from transformer_moe, such as: transformer_moe_2k,

  with  following architecture:
  * No encoder.
    * Layer 0: a - sep  (self-attention - unmasked separable convolutions)
    * Layer 1: a - sep
    * Layer 2: a - sep
    * Layer 3: a - sep
    * Layer 4: a - sep
  * Decoder architecture:
    * Layer 0: a - a - sepm  (self-attention - enco/deco-attention - masked sep)
    * Layer 1: a - a - sepm
    * Layer 2: a - a - moe  (mixture of expert layers in the middle)
    * Layer 3: a - a - sepm
    * Layer 4: a - a - sepm

I get :

KeyError: "in converted code:\n relative to E:\\workspace\\nmt-train\\tensor2tensor:\n\n utils\\t2t_model.py:326 call\n sharded_logits, losses = self.model_fn_sharded(sharded_features)\n utils\\t2t_model.py:374 model_fn_sharded\n self._to_single_features_dict(transformed_features))\n models\\research\\transformer_moe.py:172 body_sharded\n x = prepostprocess(layers[ff_type])(\n\n KeyError: 'moe'\n"