pender / chatbot-rnn

A toy chatbot powered by deep learning and trained on data from Reddit
MIT License
900 stars 371 forks source link

Questions on Training Dataset #40

Closed BFMarks closed 6 years ago

BFMarks commented 6 years ago

Thanks for the great repo, quick questions though:

pender commented 6 years ago

What is the minimum amount of data rows would you recommend for a new dataset?

I'd say a few megabytes compressed is probably the minimum. Bigger models will need greater amounts of data to avoid overfitting.

What is the exact ideal format for reformatting a new dataset?

Multiple lines of dialogue, each starting with "> " (note the space) and an empty newline between dialogues.

Can it not be that negative with a different dataset (and lighten up a little)?

Yes, a data set with less negative content will probably result in a bot with a less negative tone.