tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.11k stars 3.44k forks source link

It seems that the fc layer of the moe type has not been implemented... #1893

Open Roshanson opened 2 years ago

Roshanson commented 2 years ago

Description

It seems that the fc layer of the moe type has not been implemented...

In tensor2tensor/layers/common_attention.py:289 , I can't find the the fc layer of the moe type.

  cur_layers = dict(
      # Attention layers:
      a=multihead_attention_fn,  # Multihead full attention
      loc=local_attention_fn,  # Local attention
      locm=local_attention_masked_fn,  # Local attention (masked)
      red=compressed_attention_fn,  # Memory-compressed attention
      redm=compressed_attention_masked_fn,  # Memory-compressed att (masked)
      mem=memeff_attention_fn,  # Memory efficient
      # Feed-forward layers:
      fc=conv_hidden_relu,  # Fully connected
      sep=sep_conv_relu,  # Separable convolution (unmasked)
      sepm=sep_conv_relu_masked,  # Separable convolution (masked)
  )

Environment information

OS:  Linux
tensor2tensor 1.14.1
$ python -3.6.5

Error logs

KeyError: 
"in converted code:\n relative to   tensor2tensor:\n\n utils\\t2t_model.py:326 call\n sharded_logits, losses = self.model_fn_sharded(sharded_features)\n utils\\t2t_model.py:374 model_fn_sharded\n self._to_single_features_dict(transformed_features))\n models\\research\\transformer_moe.py:172 body_sharded\n x = prepostprocess(layers[ff_type])(\n\n KeyError: 'moe'\n"

Steps to reproduce:

set:
 FLAGS.model = "transformer_moe"  
 FLAGS.hparams_set = "transformer_moe_2k" 

then start train
ferdiko commented 2 years ago

Are there any updates on this?

ferdiko commented 2 years ago

@Roshanson what did you end up doing?