Dailydialog: Re-introduce [EOT] tokens as alternating speaker 1 and speaker 2 (+ general streamlining of data cleaning)

Data Cleaning

Minor changes. Data cleaning of the human datasets has been streamlined overall. Dailydialog has been modified.

DailyDialog

Previously removed [EOT] tokens in the dailydialog dataset has been reintroduced as alternating speaker 1 and speaker 2 as such:

"source": "i hope the teacher decides to curve our test grades. speaker 2: i wouldn't count on it. speaker 1: she did last time."

This change will be incorporated in new prompts for the dataset, prompting models to follow the speaker labels, and thereby only write one response instead of generating a conversation (as has been the problem with the dataset).

rbroc / echo

Dailydialog: Re-introduce [EOT] tokens as alternating speaker 1 and speaker 2 (+ general streamlining of data cleaning) #39

Data Cleaning

DailyDialog