Closed sumanbanerjee1 closed 7 years ago
Hi! Thank you for your interest. Actually, user utterances are also included in the set of utterances.
You can see those parts in prepro-dialog.py
in _get_data
function.
Here, paragraphs
is the set of all utterances that is considered as context
in the paper.
User utterances are added in paragraph
here and it is later added in paragraphs
here.
Also, system responses are added in paragraphs
here, where a_
is a tokenized one of the system response such as "Here it is".
Thanks again for your question and let me know if you have further question!
Hello..
Thanks for responding so fast.. But the line here
adds the utterances to paragraphs only if it is a KB triple (dialog=False) and if it is a user utterance (dialog=True) then it does not get appended to paragraphs.
Just for a clarification, you mean system utterance: questions (such as "What is the phone number of A?") user utterance: answers (such as "Here is it") is it right?
#L270~#L272 is creating a_
, which is a list of tokens of answer sentence. You can see each words in answer
is appended in a_
if it is not None
.
Then, in #L273, this a_
is appended in paragraph
. This will be appended in paragraphs
for the next question-answer pair.
For example, S1: What is the phone number of A? U1: Here it is. => paragraphs = [S1] S2: Thank you. U2: You are welcome. => paragraphs = [S1,U1, S2]
Hope this can solve your question!
Just for a clarification, you mean system utterance: questions (such as "What is the phone number of A?") user utterance: answers (such as "Here is it") is it right?
No I mean the other way round: user utterance (questions) : "What is the phone number of A?" system response (answers) : "Here it is: A_phone_number"
For example, S1: What is the phone number of A? U1: Here it is. => paragraphs = [S1] S2: Thank you. U2: You are welcome. => paragraphs = [S1,U1, S2]
This will be actually the other way round: U1: What is the phone number of A? => paragraphs = [ start_symbol ] S1: Here it is. => paragraphs = [start_symbol , S1] U2: Thank you. S2: You are welcome. => paragraphs = [start_symbol, S1, S2]
U1 and U2 (user utterances) are not being added to paragraphs. The list paragraph
only gets a_
appended. And a_
is only S1 and S2 and not U1 or U2. Also words
get appended to the list paragraph
but they are only KB triples as dialog=False
there.
You can visually inspect the paragraph
element of the list data
before they are encoded to dictionary ids. I did so and hence the confusion arised.
Please see the scenario below.
U1: What is the phone number of A?
=> dialog=False
. id_ = '1'
. paragraph = [ START, U1 ]
paragraphs = []
S1: Here it is.
=> dialog=True
. paragraph (before appending a) = [START, U1]
paragraphs = [[START, U1]]
paragraph (after appending a) = [START, U1, S1]
U2: Thank you.
=> dialog=False
, paragraph = [START, U1, S1, U2]
S2: You are welcome.
=> dialog=True
. paragraph (before appending a) = [START, U1, S1, U2]
paragraphs = [ [START, U1] , [START, U1, S1, U2] ]
paragraph (after appending a) = [START, U1, S1, U2, S2]
please note that paragraph
becomes [START]
only when id_
is '1'
. Otherwise, previous paragraph
is kept.
Therefore, even when dialog=True
, we keep using the same paragraph
made from dialog=False
.
Please see the scenario below. U1: What is the phone number of A? => dialog=False. id_ = '1'. paragraph = [ START, U1 ] paragraphs = []
U2: Thank you. => dialog=False, paragraph = [START, U1, S1, U2]
For these 2 cases dialog!=False
because sents
contain 2 elements : the user utterance (sents[0]
) and the system utterance (sents[1]
), separated by a tab (\t
). And therefore len(sents) !=1
dialog=False
only if the line is a KB triple.
oh okay, I now got what you mean.
It seems that when question
is also appended in paragraph
, then problem solved.
I found out this line is in my local code but for some reason github code was not updated.
I now update it so please take a look. Thanks for catching it.
In the implementation of QRN for babi dialog it seems like the examples only include the bot(system) responses as the previous utterances (x_1,x_2,...,x_T ) in the dialog. Shouldn't it take the sequence of user utterances and the system utterances as the previous set of utterances?
Thanks in advance.