Closed jprissi closed 7 years ago
Hi, what versions of Python and Tensorflow do you use to train the script?
@eldarsilver I am using python 3.5.2 64-bit and print(tensorflow.__version__)
returns 0.12.0-rc0
.
Now that you ask me this issue could be related to my python version. Before installing everything again, as my python 2 currently installed is a 32-bit version and as I'd like to avoid a few others hours of training for nothing, I'd like to know if you had any success with an other version of Python/Tensorflow. Thanks
I have switched to Python 2.7 with Tensorflow v0.12.0-rc0 and finally the script execute.py started the training. But the program is killed with this message:
Creating 3 layers of 256 units. Created model with fresh parameters. Reading development and training data (limit: 0). global stepo 300 learning rate 0.5000 step-time 32.89 perpeplexity 36.44 Terminado (killed)
I have only used cpu training mode without gpu and 4 GB RAM. Could it be the lack of resources the reason of the failure?
I don't think lack of resources would kill it. It stops just before printing bucket results. Don't you have anything else printed ?
No, that's all. I have tried several times and always get the same results.
so i have been training for the last 48 hours and my perplexity got to 2.78 with a learning rate of 0.30 and 110000 global steps and when i test, i still just get babel, nothing coherent. i was wondering what you guys have experienced with the bot.
Nothing interesting for the moment, I'm trying the solution in #9
Yeah, I trained mine for about 12 hours, still just getting jibberish repeated words.
I had this issue with a proper data set and the only solution for me was to downgrade to Tensorflow v10.
With Tensorflow v10 when you run a test it actually says:
Reading model parameters from working_dir/seq2seq.ckpt-####
instead of
Created model with fresh parameters.
You get much better results with as little as a few thousand steps
Thanks to @jekriske, it's clear that testing mode in 0.12 does not use the trained model at all! The PR #13 fixes the code to correctly find the model file.
I was so disappointed I just left this project aside but after a quick check following an intuition, I saw that there was a train.enc.ids2000 and a train.dec.ids2000 created while running the bot. Mine were filled with '3's. Another intuition, quick check of the vocab2000.enc file in working_dir folder annnnd the fourth lines corresponds to '_UNK' hence all the '_UNK' the chatbot writes. I deleted the train.dec.ids2000 (+the other .ids2000) files and am training a new chatbot. Let's see where it goes.
TL;DR : If the #9 solution didn't worked for you, it might be because you didn't cleared correctly the good files when training a new model. Try removing all .ids2000 files of the data folder along with checkpoint files.
It didn't worked, I just reinstalled everything again (no manual cleaning) and will let you know where it goes.
Yeaaaaaaaaaaaa ! Finally got it, I had to change some lines of code as I was trying to run this on Python 3. My last try involved converting string words to bytes before using re.split as Python 3 makes a difference between bytes and strings. While this approach wasn't fundamentally wrong, I didn't converted the result back to a string so it didn't matched any word of the vocab20000 file. This is why I got so much _UNK token (in fact it was the only result). I trained it again but haven't tested yet since it is too early however my train.enc.ids20000 files contain a lot of other token than _UNK which was my main problem. I am closing this issue and will correct my answers to other questions about Python3 as my fix was the real issue. Thanks for your help ! (That could be nice to make a pull request with a Python 3 version ?)
@HazeHub "While this approach wasn't fundamentally wrong, I didn't converted the result back to a string so it didn't matched any word of the vocab20000 file. This is why I got so much _UNK token"
So where in code, did you changed it back from BYTES to STRING, which worked for you ?
@hariom-yadaw There are a few 're.split' call in data_utils.py. You can search them and then add a .encode('base64') to the string being given to re.split. You'll also have to change the part that get the input to encode it too when you're in 'test' mode.
I have a working version on my drive and I planned to fork this repo with my version but with the recent changes to the code of this repo I can't get a working version. If you don't have the time to wait (I'm busy these days), I would recommend you to add a few debug print to the code when trying to get it working.
@HazeHub I have tried the above modification also(and others), I'm still getting all the sentences replied with a single sentence filled with _UNK. If you get a time, plz share the working version. Thanks.
@hariom-yadaw Sure, here you go : https://github.com/HazeHub/tensorflow_chatbot/blob/master/temporary_personal_working_version.zip No warranty of any kind ;-)
@HazeHub Thanks a lot for sharing. But there seems to be a minor bug. Which Python and Tensorflow version are you using ? When I start to train it, I get the below error which again seem to be related with encoding bytes/strings.
python3.5/re.py", line 182, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: cannot use a bytes pattern on a string-like object
@hariom-yadaw Hi, I'm sorry but after trying it again I can tell you this does work on my computer. I'm using python 3.5 and tensorflow 0.12. Is this error message all you get ? Can you give me everything ?
Mode : train
Preparing data in working_dir/
Creating vocabulary working_dir/vocab20000.enc from data/train.enc
Traceback (most recent call last):
File "execute.py", line 292, in
Hi, I have just run the script for 6 hours and noticed it didn't made any progress during the last 5 hours. The perplexity seems to be stuck at 1.35. The part of the code that reduce learning rate didn't reduced learning rate that much .... What should I do from here ? Did you trained a good enough chat bot ? What were your settings ? Is perplexity an indicator of how good the chatbot is ? Thanks.