I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot.
But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot. But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
Cornell's convokit provides an API onto some really good sets like the famous movie dialogue corpus and also a structured API for some subreddits https://convokit.cornell.edu/
Facebook's Parl.ai has a standardized API to lots of datasets https://parl.ai/about/ eg. https://arxiv.org/pdf/1801.07243.pdf
tatoeba has a good sentence database but no conversation turns https://tatoeba.org/eng/
I'm keeping archives of a few things I find. Here are a bunch of logs for teach English conversation https://github.com/dcsan/corpus/blob/master/convo/esl-china/esl06.csv
some of which could be converted for use here.
What other sources have people found for conversations?