Open pltrdy opened 6 years ago
You're 100% right! Do you want to create a small PR with these fixes?
I'm currently experimenting with those HParams. I would definitely open a PR when I get evidence that's it's working as expected. Thanks for your feedback.
@lukaszkaiser after experimenting a bit with it, i've some remarks:
hparams.num_encoder_layers = 0
, how 'normal' is it?alpha
(value from 0.6 to 5), it seems really sensitive. Also, I face a strange over-fitting case. On a custom dataset, the model does converge, the evaluation loss is decaying and other metrics are going well (i.e. ROUGE-2, ROUGE-L are increasing). Still, at test time, my output is totally irrelevant, it actually prints a target of my dataset (the whole content) which I find really wierd. I get how overfitting could happens, I'm wondering how evaluation is OK when tests are horrible. Any clue?
Note: I've a Input graph does not use tf.data.Dataset or contain a QueueRunner
warning that I don't really understand;
@pltrdy i have the same problem as yours. do you get any improvement now ?
@lzzk nope.
Did you concatenate the inputs and targets yourself, or do you expect the model to do that?
@senarvi nope, but note that the model isn't really working, this could even be the solution
I'm interested if you investigate this.
Sry for late reply tho, missed the notification.
@lukaszkaiser is there something new about T-DMCA?
Hey,
I try to experiment with the Transformer-Decoder Memory Compressed Attention model from https://arxiv.org/pdf/1801.10198.pdf.
I'm wondering if the following HParams are relevant/enough to reproduce
Thanks.
Error 1: about
use_pad_remover
I had to set
hparams.use_pad_remover
otherwise we face:Quickfix: I just set it to
True
orFalse
to avoid it (I'm not sure what it's meant for, tho)Error 2: about
num_encoder_layers
I'm getting an exception:
Quickfix: set
hparams.num_encoder_layers = 0