HParams for "T-DMCA" from Liu 2018 "Generating Wikipedia by Summarizing Long Sequences"

pltrdy commented 6 years ago

Hey,

I try to experiment with the Transformer-Decoder Memory Compressed Attention model from https://arxiv.org/pdf/1801.10198.pdf.

I'm wondering if the following HParams are relevant/enough to reproduce

@registry.register_hparams
def transformer_dmca():
    """Hyperparameters for T-DMCA of Liu et al., (2018)
    """
    hparams = transformer_moe_base()

    hparams.layer_types = "#locm/redm/locm/redm/locm"
    hparams.max_input_seq_length = 5000
    hparams.max_target_seq_length = 5000

    return hparams

Thanks.

Error 1: about `use_pad_remover`

I had to set hparams.use_pad_remover otherwise we face:

  File "/[...]/tensor2tensor/tensor2tensor/models/transformer.py", line 86, in encode
    losses=losses)
  File "/[...]/tensor2tensor/tensor2tensor/models/transformer.py", line 1200, in transformer_encoder
    if hparams.use_pad_remover and not common_layers.is_on_tpu():
AttributeError: 'HParams' object has no attribute 'use_pad_remover'

Quickfix: I just set it to `True` or `False` to avoid it (I'm not sure what it's meant for, tho)

Error 2: about `num_encoder_layers`

I'm getting an exception:

  File "/[...]/tensor2tensor/tensor2tensor/models/transformer.py", line 1202, in transformer_encoder
    for layer in range(hparams.num_encoder_layers or hparams.num_hidden_layers):
AttributeError: 'HParams' object has no attribute 'num_encoder_layers'

Quickfix: set hparams.num_encoder_layers = 0

lukaszkaiser commented 6 years ago

You're 100% right! Do you want to create a small PR with these fixes?

pltrdy commented 6 years ago

I'm currently experimenting with those HParams. I would definitely open a PR when I get evidence that's it's working as expected. Thanks for your feedback.

pltrdy commented 6 years ago

@lukaszkaiser after experimenting a bit with it, i've some remarks:

thanks to tensorboard I've seen that the model does contain an encoder, which seems to have 5 layers, even if hparams.num_encoder_layers = 0, how 'normal' is it?
I'm having troubles with output lenght. At prediction time I may get empty output or highly repetitive, depending on alpha (value from 0.6 to 5), it seems really sensitive.
have you got some interesting results on CNN/DM? I'm currently training a model.

Also, I face a strange over-fitting case. On a custom dataset, the model does converge, the evaluation loss is decaying and other metrics are going well (i.e. ROUGE-2, ROUGE-L are increasing). Still, at test time, my output is totally irrelevant, it actually prints a target of my dataset (the whole content) which I find really wierd. I get how overfitting could happens, I'm wondering how evaluation is OK when tests are horrible. Any clue?

Note: I've a Input graph does not use tf.data.Dataset or contain a QueueRunner warning that I don't really understand;

lzzk commented 6 years ago

@pltrdy i have the same problem as yours. do you get any improvement now ?

pltrdy commented 6 years ago

@lzzk nope.

senarvi commented 6 years ago

Did you concatenate the inputs and targets yourself, or do you expect the model to do that?

pltrdy commented 5 years ago

@senarvi nope, but note that the model isn't really working, this could even be the solution

I'm interested if you investigate this.

Sry for late reply tho, missed the notification.

pltrdy commented 5 years ago

@lukaszkaiser is there something new about T-DMCA?

tensorflow / tensor2tensor