gunthercox / chatterbot-corpus

A multilingual dialog corpus
http://chatterbot-corpus.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.36k stars 1.15k forks source link

Huge datasets #233

Open atl333 opened 2 years ago

atl333 commented 2 years ago

I have 100k+ message datasets from discord id like to use, is there a specific format i could use for it?

for example im using a script and could export them as a two column csv with name and message

or a txt with name: message

or just 1 message 2 message 1 message 3 message

etc

would it work? or would it think it is all a huge conversation and not learn properly?

i could maybe make it so if it has been more than 1-2 hours between messages the conversation ended and i format it as a new convo