Roshanson commented 3 years ago

Description

It seems that the fc layer of the moe type has not been implemented...

In tensor2tensor/layers/common_attention.py:289 , I can't find the the fc layer of the moe type.

  cur_layers = dict(
      # Attention layers:
      a=multihead_attention_fn,  # Multihead full attention
      loc=local_attention_fn,  # Local attention
      locm=local_attention_masked_fn,  # Local attention (masked)
      red=compressed_attention_fn,  # Memory-compressed attention
      redm=compressed_attention_masked_fn,  # Memory-compressed att (masked)
      mem=memeff_attention_fn,  # Memory efficient
      # Feed-forward layers:
      fc=conv_hidden_relu,  # Fully connected
      sep=sep_conv_relu,  # Separable convolution (unmasked)
      sepm=sep_conv_relu_masked,  # Separable convolution (masked)
  )

Environment information

OS:  Linux
tensor2tensor 1.14.1
$ python -3.6.5

Error logs

KeyError: 
"in converted code:\n relative to   tensor2tensor:\n\n utils\\t2t_model.py:326 call\n sharded_logits, losses = self.model_fn_sharded(sharded_features)\n utils\\t2t_model.py:374 model_fn_sharded\n self._to_single_features_dict(transformed_features))\n models\\research\\transformer_moe.py:172 body_sharded\n x = prepostprocess(layers[ff_type])(\n\n KeyError: 'moe'\n"

Steps to reproduce:

set:
 FLAGS.model = "transformer_moe"  
 FLAGS.hparams_set = "transformer_moe_2k" 

then start train

ferdiko commented 3 years ago

Are there any updates on this?

ferdiko commented 3 years ago

@Roshanson what did you end up doing?

tensorflow / tensor2tensor

It seems that the fc layer of the moe type has not been implemented... #1893

Description

Environment information

Error logs

Steps to reproduce: