Missing train directory?

SinclairHudson commented 6 years ago

Traceback (most recent call last):
  File "C:\Users\Sinclair\Desktop\tensorflow_chatbot-master\execute.py", line 319, in <module>
    train()
  File "C:\Users\Sinclair\Desktop\tensorflow_chatbot-master\execute.py", line 127, in train
    enc_train, dec_train, enc_dev, dec_dev, _, _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size'])
  File "C:\Users\Sinclair\Desktop\tensorflow_chatbot-master\data_utils.py", line 137, in prepare_custom_data
    data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer)
  File "C:\Users\Sinclair\Desktop\tensorflow_chatbot-master\data_utils.py", line 116, in data_to_token_ids
    for line in data_file:
  File "F:\Python\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 162, in __next__
    return self.next()
  File "F:\Python\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 156, in next
    retval = self.readline()
  File "F:\Python\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 123, in readline
    self._preread_check()
  File "F:\Python\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 73, in _preread_check
    compat.as_bytes(self.__name), 1024 * 512, status)
  File "F:\Python\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "F:\Python\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: data/train.enc : The system cannot find the path specified.

This is the error message I get, and from what I can tell I'm missing a folder "data" that contains a training set. Is there a specific way I have to create this? I tried creating my own data folder and plopped the vocab2000 files into it, renaming them to train.dec and train.enc, but that just gave me a different error.

Any advice would be much appreciated!

vivek9237 commented 6 years ago

I am also facing the same issue. It's searching for train.dec, train.enc, test.enc etc.

Crakkerjakked commented 6 years ago

@SinclairHudson @vivek9237 Gitignore left some things out. The step by step:

In the C:\Users\Sinclair\Desktop\tensorflow_chatbot-master\ folder, make a new folder named "data". (or wherever your tensorflow_chatbot folder is)
In the new "data" folder, take all the *.txt files from the Cornell training material and dump them in.
Then you need to grab the 'prepare_data.py' file from https://github.com/suriyadeepan/datasets.git, which is in the 'datasets/seq2seq/cornell_movie_corpus/scripts' directory and also add that to the 'data' folder and run it. That should automagically make the files you're in need of in the folder it's supposed to be in.

If you leave the files in separate folders, you'll continue to get errors thrown at you because prepare_data.py won't be able to locate the *.txt files (I experienced this myself...at great length...until I wanted to strangle a defenseless puppy).

I'm also afraid to say this is as far as I have gotten. As soon as I corrected this issue, I've now got a new issue arriving. `>> Mode : train

Preparing data in working_dir/ Tokenizing data in data/train.enc Tokenizing data in data/train.dec Tokenizing data in data/test.enc Tokenizing data in data/test.enc 2017-08-07 02:30:44.988705: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations. 2017-08-07 02:30:44.989205: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machin e and could speed up CPU computations. 2017-08-07 02:30:44.989705: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machin e and could speed up CPU computations. 2017-08-07 02:30:44.990705: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your mach ine and could speed up CPU computations. 2017-08-07 02:30:44.992706: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your mach ine and could speed up CPU computations. 2017-08-07 02:30:44.994206: W c:\tf_jenkins\home\workspace\release-win\m\windows \py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. Creating 3 layers of 256 units. Traceback (most recent call last): File "execute.py", line 319, in train() File "execute.py", line 137, in train model = create_model(sess, False) File "execute.py", line 104, in create_model model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocabsize'], gConfig['dec vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['m ax_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['le arning_rate_decay_factor'], forward_only=forward_only) File "C:\Users\Jackson Andrews\tensorflow_chatbot\seq2seq_model.py", line 146, in init self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets( AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'seq2seq'`

akshkr commented 6 years ago

Replace the following lines

self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(

with

self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(

Crakkerjakked commented 6 years ago

Did that.

Still leads directly into another brick wall.

Traceback (most recent call last): File "execute.py", line 319, in train() File "execute.py", line 137, in train model = create_model(sess, False) File "execute.py", line 104, in create_model model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only) File "/home/jerry/Desktop/ML files/tensorflow_chatbot-master/seq2seq_model.py", line 158, in init softmax_loss_function=softmax_loss_function) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1221, in model_with_buckets softmax_loss_function=softmax_loss_function)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1134, in sequence_loss softmax_loss_function=softmax_loss_function)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1089, in sequence_loss_by_example crossent = softmax_loss_function(labels=target, logits=logit) TypeError: sampled_loss() got an unexpected keyword argument 'logits'

Crakkerjakked commented 6 years ago

With the amount of issues this project gets trying to run it with an up-to-date version of Tensorflow, I feel it would probably be better for an amateur (like myself) to look to this project more as a broken example and rebuild it using this as a map, of sorts. See what needs to be called on and then look up the Tensorflow documentation to write the correct code. I don't foresee myself trying to continue slamming my head against a wall trying to figure out how to proverbially ride backwards on a bicycle. The tensorflow library is going to keep progressing, so rather than trying to regress, might as well use it as a template to make an updated model.

lrbachtiar commented 6 years ago

@Crakkerjakked I feel the same way. Have you made much progress on updating the dependancies of this repo?

gustavoverneque commented 6 years ago

Has anyone get to run this program without errors?

monkut commented 6 years ago

I'm trying to get this working as well, and in addition trying to understand what is actually going on here...

In my attempt to understand what that prepare_data.py file is supposed to be doing... I've tried to clean it up and add more comments so that the actions are more clear. (Still not sure exactly what's going on...)

https://github.com/monkut/cornell-movie-corpus-processor/blob/master/process.py

Any comments are welcome!

monkut commented 6 years ago

Finally got everything to seemingly run, but the resulting test output was crap...

python execute.py

>> Mode : test

WARNING:tensorflow:From tensorflow_chatbot/seq2seq_model.py:174 in __init__.: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
Reading model parameters from working_dir/seq2seq.ckpt-10200
> Where are you from?
_UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK
> Are you trained?
_UNK _UNK _UNK !
> I like pizza, how about you?
_UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK

Gotchas:

tensorflow needs to be version 0.12.1.
Cornell Movie Corpus Text encoding is, "ISO-8859-2"

I'm running this with python3.6, so I needed to update some of the text handling imports.

If I can get this working I'd issue a PR.... but not quite there yet....

monkut commented 6 years ago

Anyway, here's my cleanup:

https://github.com/monkut/tensorflow_chatbot

aar2416 commented 6 years ago

i am not able to run your code @monkut

my output is

Mode : train

Preparing data in working_dir/ Creating vocabulary working_dir/vocab20000.enc from data/train.enc processing line 100000

Full Vocabulary Size : 45604

Vocab Truncated to: 20000 Creating vocabulary working_dir/vocab20000.dec from data/train.dec processing line 100000 Full Vocabulary Size : 44343 Vocab Truncated to: 20000 Tokenizing data in data/train.enc Traceback (most recent call last): File "execute.py", line 352, in train() File "execute.py", line 138, in train gConfig['dec_vocab_size']) File "C:\Users\aamis\Desktop\tensorflow_chatbot\data_utils.py", line 141, in prepare_custom_data data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer) File "C:\Users\aamis\Desktop\tensorflow_chatbot\data_utils.py", line 126, in data_to_token_ids normalize_digits) File "C:\Users\aamis\Desktop\tensorflow_chatbot\data_utils.py", line 108, in sentence_to_token_ids words = tokenizer(sentence) File "C:\Users\aamis\Desktop\tensorflow_chatbot\data_utils.py", line 51, in basic_tokenizer words = re.split(_WORD_SPLIT, space_separated_fragment) File "C:\Users\aamis\AppData\Local\Programs\Python\Python35\lib\re.py", line 203, in split return _compile(pattern, flags).split(string, maxsplit) TypeError: cannot use a string pattern on a bytes-like object

lohith-emplay commented 6 years ago

Simple. Try this first instead of the above. If it doesn't work. To run the prepare.py . Just use the Python 2 interpreter instead of Python 3 to execute . This fixed the problem that I was facing.

aar2416 commented 6 years ago

which code @lohith-emplay ?

lohith-emplay commented 6 years ago

This code


import random

''' 
    1. Read from 'movie-lines.txt'
    2. Create a dictionary with ( key = line_id, value = text )
'''
def get_id2line():
    lines=open('movie_lines.txt').read().split('\n')
    id2line = {}
    for line in lines:
        _line = line.split(' +++$+++ ')
        if len(_line) == 5:
            id2line[_line[0]] = _line[4]
    return id2line

'''
    1. Read from 'movie_conversations.txt'
    2. Create a list of [list of line_id's]
'''
def get_conversations():
    conv_lines = open('movie_conversations.txt').read().split('\n')
    convs = [ ]
    for line in conv_lines[:-1]:
        _line = line.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(" ","")
        convs.append(_line.split(','))
    return convs

'''
    1. Get each conversation
    2. Get each line from conversation
    3. Save each conversation to file
'''
def extract_conversations(convs,id2line,path=''):
    idx = 0
    for conv in convs:
        f_conv = open(path + str(idx)+'.txt', 'w')
        for line_id in conv:
            f_conv.write(id2line[line_id])
            f_conv.write('\n')
        f_conv.close()
        idx += 1

'''
    Get lists of all conversations as Questions and Answers
    1. [questions]
    2. [answers]
'''
def gather_dataset(convs, id2line):
    questions = []; answers = []

    for conv in convs:
        if len(conv) %2 != 0:
            conv = conv[:-1]
        for i in range(len(conv)):
            if i%2 == 0:
                questions.append(id2line[conv[i]])
            else:
                answers.append(id2line[conv[i]])

    return questions, answers

'''
    We need 4 files
    1. train.enc : Encoder input for training
    2. train.dec : Decoder input for training
    3. test.enc  : Encoder input for testing
    4. test.dec  : Decoder input for testing
'''
def prepare_seq2seq_files(questions, answers, path='',TESTSET_SIZE = 30000):

    # open files
    train_enc = open(path + 'train.enc','w')
    train_dec = open(path + 'train.dec','w')
    test_enc  = open(path + 'test.enc', 'w')
    test_dec  = open(path + 'test.dec', 'w')

    # choose 30,000 (TESTSET_SIZE) items to put into testset
    test_ids = random.sample([i for i in range(len(questions))],TESTSET_SIZE)

    for i in range(len(questions)):
        if i in test_ids:
            test_enc.write(questions[i]+'\n')
            test_dec.write(answers[i]+ '\n' )
        else:
            train_enc.write(questions[i]+'\n')
            train_dec.write(answers[i]+ '\n' )
        if i%10000 == 0:
            print '\n>> written %d lines' %(i) 

    # close files
    train_enc.close()
    train_dec.close()
    test_enc.close()
    test_dec.close()

####
# main()
####

id2line = get_id2line()
print '>> gathered id2line dictionary.\n'
convs = get_conversations()
print '>> gathered conversations.\n'
questions, answers = gather_dataset(convs,id2line)
print questions[:2]
print '>> gathered questions and answers.\n'
prepare_seq2seq_files(questions,answers)
``` #i @Crakkerjakked

karchuli commented 5 years ago

OK. I am getting this. What now?

(tensorflow_env) C:\Users\DELL\Desktop\cbb>python execute.py

Mode : train

Preparing data in working_dir/ Tokenizing data in data/train.enc Traceback (most recent call last): File "execute.py", line 319, in train() File "execute.py", line 127, in train enc_train, dec_train, enc_dev, decdev, , _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size']) File "C:\Users\DELL\Desktop\cbb\data_utils.py", line 137, in prepare_custom_data data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer) File "C:\Users\DELL\Desktop\cbb\data_utils.py", line 116, in data_to_token_ids for line in data_file: File "C:\Users\DELL\Anaconda3\envs\tensorflow_env\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 220, in next return self.next() File "C:\Users\DELL\Anaconda3\envs\tensorflow_env\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 214, in next retval = self.readline() File "C:\Users\DELL\Anaconda3\envs\tensorflow_env\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 183, in readline self._preread_check() File "C:\Users\DELL\Anaconda3\envs\tensorflow_env\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check compat.as_bytes(self.name), 1024 * 512, status) File "C:\Users\DELL\Anaconda3\envs\tensorflow_env\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 526, in exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: data/train.enc : The system cannot find the path specified. ; No such process

mcarlock-updox commented 3 years ago

has anyone found a way to get this to run? I'm getting this errorr?

Traceback (most recent call last): File "execute.py", line 352, in train() File "execute.py", line 138, in train gConfig['dec_vocab_size']) File "/Users/tensorflow_chatbot-master/data_utils.py", line 141, in prepare_custom_data data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer) File "/Users/tensorflow_chatbot-master/data_utils.py", line 126, in data_to_token_ids normalize_digits) File "/Users/tensorflow_chatbot-master/data_utils.py", line 108, in sentence_to_token_ids words = tokenizer(sentence) File "/Users/tensorflow_chatbot-master/data_utils.py", line 51, in basic_tokenizer words = re.split(_WORD_SPLIT, space_separated_fragment) File "/Users/anaconda3/lib/python3.7/re.py", line 213, in split return _compile(pattern, flags).split(string, maxsplit) TypeError: cannot use a string pattern on a bytes-like object

llSourcell / tensorflow_chatbot

Missing train directory? #55