llSourcell / tensorflow_chatbot

Tensorflow chatbot demo by @Sirajology on Youtube
1.45k stars 806 forks source link

Can't get a good enough perplexity score #8

Closed jprissi closed 7 years ago

jprissi commented 7 years ago

Hi, I have just run the script for 6 hours and noticed it didn't made any progress during the last 5 hours. The perplexity seems to be stuck at 1.35. The part of the code that reduce learning rate didn't reduced learning rate that much .... What should I do from here ? Did you trained a good enough chat bot ? What were your settings ? Is perplexity an indicator of how good the chatbot is ? Thanks.

eldarsilver commented 7 years ago

Hi, what versions of Python and Tensorflow do you use to train the script?

jprissi commented 7 years ago

@eldarsilver I am using python 3.5.2 64-bit and print(tensorflow.__version__) returns 0.12.0-rc0.

Now that you ask me this issue could be related to my python version. Before installing everything again, as my python 2 currently installed is a 32-bit version and as I'd like to avoid a few others hours of training for nothing, I'd like to know if you had any success with an other version of Python/Tensorflow. Thanks

eldarsilver commented 7 years ago

I have switched to Python 2.7 with Tensorflow v0.12.0-rc0 and finally the script execute.py started the training. But the program is killed with this message:

Creating 3 layers of 256 units. Created model with fresh parameters. Reading development and training data (limit: 0). global stepo 300 learning rate 0.5000 step-time 32.89 perpeplexity 36.44 Terminado (killed)

I have only used cpu training mode without gpu and 4 GB RAM. Could it be the lack of resources the reason of the failure?

jprissi commented 7 years ago

I don't think lack of resources would kill it. It stops just before printing bucket results. Don't you have anything else printed ?

eldarsilver commented 7 years ago

No, that's all. I have tried several times and always get the same results.

Niko2756 commented 7 years ago

so i have been training for the last 48 hours and my perplexity got to 2.78 with a learning rate of 0.30 and 110000 global steps and when i test, i still just get babel, nothing coherent. i was wondering what you guys have experienced with the bot.

jprissi commented 7 years ago

Nothing interesting for the moment, I'm trying the solution in #9

MarkLyck commented 7 years ago

Yeah, I trained mine for about 12 hours, still just getting jibberish repeated words.

jekriske commented 7 years ago

I had this issue with a proper data set and the only solution for me was to downgrade to Tensorflow v10. With Tensorflow v10 when you run a test it actually says: Reading model parameters from working_dir/seq2seq.ckpt-#### instead of Created model with fresh parameters. You get much better results with as little as a few thousand steps

willgroves commented 7 years ago

Thanks to @jekriske, it's clear that testing mode in 0.12 does not use the trained model at all! The PR #13 fixes the code to correctly find the model file.

jprissi commented 7 years ago

I was so disappointed I just left this project aside but after a quick check following an intuition, I saw that there was a train.enc.ids2000 and a train.dec.ids2000 created while running the bot. Mine were filled with '3's. Another intuition, quick check of the vocab2000.enc file in working_dir folder annnnd the fourth lines corresponds to '_UNK' hence all the '_UNK' the chatbot writes. I deleted the train.dec.ids2000 (+the other .ids2000) files and am training a new chatbot. Let's see where it goes.

TL;DR : If the #9 solution didn't worked for you, it might be because you didn't cleared correctly the good files when training a new model. Try removing all .ids2000 files of the data folder along with checkpoint files.

jprissi commented 7 years ago

It didn't worked, I just reinstalled everything again (no manual cleaning) and will let you know where it goes.

jprissi commented 7 years ago

Yeaaaaaaaaaaaa ! Finally got it, I had to change some lines of code as I was trying to run this on Python 3. My last try involved converting string words to bytes before using re.split as Python 3 makes a difference between bytes and strings. While this approach wasn't fundamentally wrong, I didn't converted the result back to a string so it didn't matched any word of the vocab20000 file. This is why I got so much _UNK token (in fact it was the only result). I trained it again but haven't tested yet since it is too early however my train.enc.ids20000 files contain a lot of other token than _UNK which was my main problem. I am closing this issue and will correct my answers to other questions about Python3 as my fix was the real issue. Thanks for your help ! (That could be nice to make a pull request with a Python 3 version ?)

hariom-yadaw commented 7 years ago

@HazeHub "While this approach wasn't fundamentally wrong, I didn't converted the result back to a string so it didn't matched any word of the vocab20000 file. This is why I got so much _UNK token"

So where in code, did you changed it back from BYTES to STRING, which worked for you ?

jprissi commented 7 years ago

@hariom-yadaw There are a few 're.split' call in data_utils.py. You can search them and then add a .encode('base64') to the string being given to re.split. You'll also have to change the part that get the input to encode it too when you're in 'test' mode.

I have a working version on my drive and I planned to fork this repo with my version but with the recent changes to the code of this repo I can't get a working version. If you don't have the time to wait (I'm busy these days), I would recommend you to add a few debug print to the code when trying to get it working.

hariom-yadaw commented 7 years ago

@HazeHub I have tried the above modification also(and others), I'm still getting all the sentences replied with a single sentence filled with _UNK. If you get a time, plz share the working version. Thanks.

jprissi commented 7 years ago

@hariom-yadaw Sure, here you go : https://github.com/HazeHub/tensorflow_chatbot/blob/master/temporary_personal_working_version.zip No warranty of any kind ;-)

hariom-yadaw commented 7 years ago

@HazeHub Thanks a lot for sharing. But there seems to be a minor bug. Which Python and Tensorflow version are you using ? When I start to train it, I get the below error which again seem to be related with encoding bytes/strings.

python3.5/re.py", line 182, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: cannot use a bytes pattern on a string-like object

jprissi commented 7 years ago

@hariom-yadaw Hi, I'm sorry but after trying it again I can tell you this does work on my computer. I'm using python 3.5 and tensorflow 0.12. Is this error message all you get ? Can you give me everything ?

hariom-yadaw commented 7 years ago

@HazeHub please find the below error i get when I try to train. Project Directory: /home/hariom/ai_hariom/tfChatBot/ I'm using virtual environment with Python3.5.2 and tensorflow 0.12.1

Mode : train

Preparing data in working_dir/ Creating vocabulary working_dir/vocab20000.enc from data/train.enc Traceback (most recent call last): File "execute.py", line 292, in train() File "execute.py", line 114, in train enc_train, dec_train, enc_dev, decdev, , _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size']) File "/home/hariom/ai_hariom/tfChatBot/data_utils.py", line 138, in prepare_custom_data create_vocabulary(enc_vocab_path, train_enc, enc_vocabulary_size, tokenizer) File "/home/hariom/ai_hariom/tfChatBot/data_utils.py", line 72, in create_vocabulary word = re.sub(_DIGIT_RE, b"0", w) if normalize_digits else w File "/home/hariom/ai_hariom/venv_chatbot/lib/python3.5/re.py", line 182, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: cannot use a bytes pattern on a string-like object