oswaldoludwig / Seq2seq-Chatbot-for-Keras

This repository contains a new generative model of chatbot based on seq2seq modeling.
Apache License 2.0
331 stars 98 forks source link

File vocabulary_movie #32

Closed Maniac-tv closed 4 years ago

Maniac-tv commented 4 years ago

Hello, how to generate a new file vocabulary_movie for my data?

oswaldoludwig commented 4 years ago

Check lines 51-60 of get_train_data.py. Just be careful with the index of the special symbols, BOS and EOS, their positions/indexes in the dictionary must be the same.

Maniac-tv commented 4 years ago

I already figured out. There was a misunderstanding with the code from the file split_qa.py:

for i, raw_word in enumerate(text):
    pos = raw_word.find('+++$+++')
    if pos > -1:
        person = raw_word[pos+7:pos+10]
        raw_word = raw_word[pos+8:]
    while pos > -1:
        pos = raw_word.find('+++$+++')
        raw_word = raw_word[pos+2:]     
    raw_word = raw_word.replace('$+++','')
    previous_person = person

What are signatures used for '+++$+++', '$+++' ? There are no such signatures in the text, variable 'person' stays empty

oswaldoludwig commented 4 years ago

Okay, next time, please close the issue after finding the solution.