Minor changes. Data cleaning of the human datasets has been streamlined overall. Dailydialog has been modified.
DailyDialog
Previously removed [EOT] tokens in the dailydialog dataset has been reintroduced as alternating speaker 1 and speaker 2 as such:
"source": "i hope the teacher decides to curve our test grades. speaker 2: i wouldn't count on it. speaker 1: she did last time."
This change will be incorporated in new prompts for the dataset, prompting models to follow the speaker labels, and thereby only write one response instead of generating a conversation (as has been the problem with the dataset).
Data Cleaning
Minor changes. Data cleaning of the human datasets has been streamlined overall. Dailydialog has been modified.
DailyDialog
Previously removed
[EOT]
tokens in thedailydialog
dataset has been reintroduced as alternating speaker 1 and speaker 2 as such:This change will be incorporated in new prompts for the dataset, prompting models to follow the speaker labels, and thereby only write one response instead of generating a conversation (as has been the problem with the dataset).