I tried to run the DST training script in vscode debug mode. I found that the <|context|> in train.history_belief was encoded to a list of tokens rather than a single token.
['Ġ<', '|', 'context', '|', '>'] and its corresponding ids [1279, 91, 22866, 91, 29]
I tried to track the tokenizer, but the token: "<|context|>" was not added to the gpt2 vocabulary on purpose.
I'm wondering where did I go wrong, or this result is right?
Hi,
I tried to run the DST training script in vscode debug mode. I found that the <|context|> in train.history_belief was encoded to a list of tokens rather than a single token. ['Ġ<', '|', 'context', '|', '>'] and its corresponding ids [1279, 91, 22866, 91, 29]
I tried to track the tokenizer, but the token: "<|context|>" was not added to the gpt2 vocabulary on purpose.
I'm wondering where did I go wrong, or this result is right?