Bucket perplexity increasing : why ?

jprissi commented 7 years ago

Hi, I finally managed to have a workable chatbot in Python 3. It isn't trained enough for the moment but gives quite interesting answers. What bothers me is that while overall perplexity decreases as it should (for 300 to 8.90 approximately), the bucket perplexity increase at each step with a perplexity of approximately 2000 while it was almost below 100 at start. I just wanted to know how a seq2seq model can see it's bucket perplexity increase ? What causes it and why does the chatbot seems to improve despite this issue ?

vgoklani commented 7 years ago

I have the same issue too. Would be cool if we could compare performance stats, as well as model architecture choices.

jprissi commented 7 years ago

What do you mean by performance stats and architecture choices ?

vgoklani commented 7 years ago

How many layers are you running in the LSTM, and how many nodes. And more importantly, how do those choices affect your performance, in terms of perplexity.

jprissi commented 7 years ago

I used these settings from #9 Seiwert's answer. In term of perplexity I have right now : global step 99300 learning rate 0.3149 step-time 0.13 perplexity 6.28 eval: bucket 0 perplexity 11643.76 eval: bucket 1 perplexity 18866.27 eval: bucket 2 perplexity 7587.15 eval: bucket 3 perplexity 5708.92 I'd like to try increasing the vocab size and retry with defaults settings for the LSTM along with an other dataset. Will do it when I'll have solved this issue What about your tries ?

vgoklani commented 7 years ago

I'm running a larger network now (4-layer LSTM with 1024 notes) and will post results soon

I need to add beam-search...

https://github.com/Marsan-Ma/tf_chatbot_seq2seq_antilm

I think you need to add early-stopping, the bucket perplexities are increasing

dotjrt commented 7 years ago

Your test set perplexity is increasing because of a typo in seq2seq.ini on line 7: test_dec = data/test.enc should be test_dec = data/test.dec

The bot is confused because the encoder input and decoder output for testing are the same thing.

vgoklani commented 7 years ago

@jrthom18

nope;

train_enc = data/cornell_movie_dialogs_corpus/processed/train_encoder.txt train_dec = data/cornell_movie_dialogs_corpus/processed/train_decoder.txt test_enc = data/cornell_movie_dialogs_corpus/processed/test_encoder.txt test_dec = data/cornell_movie_dialogs_corpus/processed/test_decoder.txt

dotjrt commented 7 years ago

@vgoklani Where is that bit of code you just gave?

vgoklani commented 7 years ago

I cleaned up the code for creating the files: https://gist.github.com/vgoklani/e33973d3202639e1f021bd745b8971c2

and then updated the corresponding lines in the ini file:

[strings]
mode = train
train_enc = data/cornell_movie_dialogs_corpus/processed/train_encoder.txt
train_dec = data/cornell_movie_dialogs_corpus/processed/train_decoder.txt
test_enc = data/cornell_movie_dialogs_corpus/processed/test_encoder.txt
test_dec = data/cornell_movie_dialogs_corpus/processed/test_decoder.txt
working_directory = data/cornell_movie_dialogs_corpus/processed
[ints]
enc_vocab_size = 25000
dec_vocab_size = 25000
num_layers = 4
# typical options : 128, 256, 512, 1024
layer_size = 1024
# dataset size limit; typically none : no limit
max_train_data_size = 0
batch_size = 64
# steps per checkpoint
#   Note : At a checkpoint, models parameters are saved, model is evaluated
#           and results are printed
steps_per_checkpoint = 500
[floats]
learning_rate = 0.5
learning_rate_decay_factor = 0.99
max_gradient_norm = 5.0
##############################################################################
# Note : Edit the bucket sizes at line47 of execute.py (_buckets)
#
#   Learn more about the configurations from this link
#       https://www.tensorflow.org/versions/r0.9/tutorials/seq2seq/index.html
##############################################################################

dotjrt commented 7 years ago

@vgoklani OK well assuming @HazeHub is using the code out of the box I would recommend first checking that typo I mentioned. I saw the same behavior described above and that fixed my issue so that each bucket perplexity decreased over time as one would expect.

vgoklani commented 7 years ago

Mine decreases to a point, but then increases if you leave it running.

jprissi commented 7 years ago

Thank you @jrthom18, you made my day. I effectively saw this issue on the original repo of suriyadeepan and forgot about it until your message. This was the case and this is probably the solution to my issue. I will test and tell you what. Thank you for the link @vgoklani, I will try to add early-stopping if my issue still isn't solved ;-)

jprissi commented 7 years ago

Okay that made it :

global step 300 learning rate 0.5000 step-time 0.16 perplexity 324.42 eval: bucket 0 perplexity 54.46 eval: bucket 1 perplexity 96.12 eval: bucket 2 perplexity 114.59 eval: bucket 3 perplexity 175.86 global step 600 learning rate 0.5000 step-time 0.13 perplexity 88.98 eval: bucket 0 perplexity 32.07 eval: bucket 1 perplexity 46.06 eval: bucket 2 perplexity 88.02 eval: bucket 3 perplexity 112.59 global step 900 learning rate 0.5000 step-time 0.13 perplexity 54.41 eval: bucket 0 perplexity 21.67 eval: bucket 1 perplexity 36.44 eval: bucket 2 perplexity 44.71 eval: bucket 3 perplexity 62.42

As you can see, bucket perplexity decreasing no matter what. I'll see where it goes :-)

llSourcell / tensorflow_chatbot

Bucket perplexity increasing : why ? #18