Taking only the system responses as Previous utterances

sumanbanerjee1 commented 7 years ago

In the implementation of QRN for babi dialog it seems like the examples only include the bot(system) responses as the previous utterances (x_1,x_2,...,x_T ) in the dialog. Shouldn't it take the sequence of user utterances and the system utterances as the previous set of utterances?

Thanks in advance.

shmsw25 commented 7 years ago

Hi! Thank you for your interest. Actually, user utterances are also included in the set of utterances.

You can see those parts in prepro-dialog.py in _get_data function. Here, paragraphs is the set of all utterances that is considered as context in the paper.

User utterances are added in paragraph here and it is later added in paragraphs here.

Also, system responses are added in paragraphs here, where a_ is a tokenized one of the system response such as "Here it is".

Thanks again for your question and let me know if you have further question!

sumanbanerjee1 commented 7 years ago

Hello.. Thanks for responding so fast.. But the line here adds the utterances to paragraphs only if it is a KB triple (dialog=False) and if it is a user utterance (dialog=True) then it does not get appended to paragraphs.

shmsw25 commented 7 years ago

Just for a clarification, you mean system utterance: questions (such as "What is the phone number of A?") user utterance: answers (such as "Here is it") is it right?

#L270~#L272 is creating a_, which is a list of tokens of answer sentence. You can see each words in answer is appended in a_ if it is not None. Then, in #L273, this a_ is appended in paragraph. This will be appended in paragraphs for the next question-answer pair.

For example, S1: What is the phone number of A? U1: Here it is. => paragraphs = [S1] S2: Thank you. U2: You are welcome. => paragraphs = [S1,U1, S2]

Hope this can solve your question!

sumanbanerjee1 commented 7 years ago

Just for a clarification, you mean system utterance: questions (such as "What is the phone number of A?") user utterance: answers (such as "Here is it") is it right?

No I mean the other way round: user utterance (questions) : "What is the phone number of A?" system response (answers) : "Here it is: A_phone_number"

For example, S1: What is the phone number of A? U1: Here it is. => paragraphs = [S1] S2: Thank you. U2: You are welcome. => paragraphs = [S1,U1, S2]

This will be actually the other way round: U1: What is the phone number of A? => paragraphs = [ start_symbol ] S1: Here it is. => paragraphs = [start_symbol , S1] U2: Thank you. S2: You are welcome. => paragraphs = [start_symbol, S1, S2]

U1 and U2 (user utterances) are not being added to paragraphs. The list paragraph only gets a_ appended. And a_ is only S1 and S2 and not U1 or U2. Also words get appended to the list paragraph but they are only KB triples as dialog=False there.

You can visually inspect the paragraph element of the list data before they are encoded to dictionary ids. I did so and hence the confusion arised.

shmsw25 commented 7 years ago

Please see the scenario below.

U1: What is the phone number of A? => dialog=False. id_ = '1'. paragraph = [ START, U1 ] paragraphs = []

S1: Here it is. => dialog=True. paragraph (before appending a) = [START, U1] paragraphs = [[START, U1]] paragraph (after appending a) = [START, U1, S1]

U2: Thank you. => dialog=False, paragraph = [START, U1, S1, U2]

S2: You are welcome. => dialog=True. paragraph (before appending a) = [START, U1, S1, U2] paragraphs = [ [START, U1] , [START, U1, S1, U2] ] paragraph (after appending a) = [START, U1, S1, U2, S2]

please note that paragraph becomes [START] only when id_ is '1'. Otherwise, previous paragraph is kept. Therefore, even when dialog=True, we keep using the same paragraph made from dialog=False.

sumanbanerjee1 commented 7 years ago

Please see the scenario below. U1: What is the phone number of A? => dialog=False. id_ = '1'. paragraph = [ START, U1 ] paragraphs = []

U2: Thank you. => dialog=False, paragraph = [START, U1, S1, U2]

For these 2 cases dialog!=False because sents contain 2 elements : the user utterance (sents[0]) and the system utterance (sents[1]), separated by a tab (\t). And therefore len(sents) !=1

dialog=False only if the line is a KB triple.

shmsw25 commented 7 years ago

oh okay, I now got what you mean. It seems that when question is also appended in paragraph, then problem solved. I found out this line is in my local code but for some reason github code was not updated. I now update it so please take a look. Thanks for catching it.

seominjoon / qrn

Taking only the system responses as Previous utterances #2