gunthercox / ChatterBot

ChatterBot is a machine learning, conversational dialog engine for creating chat bots
https://chatterbot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
14.01k stars 4.42k forks source link

suggestion for implementing deep learning in chatterbot #1634

Open brightening-eyes opened 5 years ago

brightening-eyes commented 5 years ago

hi, since chatterbot doesn't support deep learning, i will propose my suggestions to make it better (since my deep learning library is mxnet and the gluon api, i will focusing on giving examples based on that)

  1. instead of a database storage adapter, we can make a class like mxnetAdapter which gets a model class in it's kwargs argument, and a file to load the model if the file exists, it should load it otherwise initialize the model using self.model.initialize() it can also have a ctx in the kwargs indicating if it should be initialized on the cpu or gpu the storage adapter doesn't have drop(), count() etc methods, since it will be the trained model it can
  2. accept a loss in it's constructor's **kwargs variable
  3. the loss can be in the trainer class (then it won't be possible to learn and the robot should be in read_only mode) in storage adapter's update() method, we should do the forward pass and back propogation (if we have the loss), otherwise we shouldn't do anything and, at last, we should pass the statement.text as onehot input to the model for the trainer:
  4. in the train(), it should accept x, y, batch_size, epochs, etc
  5. the trainer should use the storage adapter's model and the loss to do the forward and backward passes I think by using this approach, we can have everything (logic adapters, comparison functions etc have a nice time.
vkosuri commented 5 years ago

Exciting, curious to see, if possible could you please share with an example?

brightening-eyes commented 5 years ago

hi, since database adapters return the next responce from different sql statements, this is a bit different with neural networks in neural networks when we want to train, we have an x and y. the x is what user says to bot, while in y we store the answer when trained, the model learns to return different responces based on what user said (of course based on the training data, and very very accurate than sql statements). so, when the user says "hello", the robot will return "hi, how are you?" based on the training data. but, when trained with a larger data like ubuntu corpus etc, it can learn to interract better (if more parameters like context, previous statements) are added, it will be better in terms of accuracy (although it will require lots of data to train). about the example, i don't know when i'm going to write it. but, we have a self.model, a self.loss in the adapter, a mxnet.gluon.Trainer in the trainer, and maybe a preprocessor to transform the statements into one-hot representation. or, they can be transformed into one-hot representation in the adapter also, mxnet.metric.Accuracy can be used in the trainer class, to determine how the model performs during the training and possibly, validation process.

gunthercox commented 5 years ago

Hi @brightening-eyes, I think this is a great suggestion and this would be an awesome addition to ChatterBot. A decent amount of research and testing would but I think it would be worth the effort.

ignertic commented 5 years ago

This is a great idea but since deep learning is now involved what will this mean for low power IoT devices currently running chatterbot ?

Orfeous commented 5 years ago

Hi @ignertic , With the above mentioned approach you can train your model in a powerful machine then ship the software with the pretrained model (which is a flat-file) to your IoT device and it will work as before. :)

/Gabor

ignertic commented 5 years ago

@Orfeous , yes of course :) Just curious though, any progress with this ?

brightening-eyes commented 5 years ago

about deep learning, first, it depends on your model (basicly a seq2seq model) which is slow even on cpu regarding the training process it depends on your data, again the model, the loss function, if it will learn from new conversations or not (if yes, it will be slower since we need more training).

Orfeous commented 5 years ago

I think, it would be more beneficial to implement it as a logic adapter so it steps in when it can provide better answer than other adapters. In this way we can keep our database to log the conversations and later we can reuse these logs as training data for our model.

The learn by chatting functinailty can be achieved eather by running the training procedure periodicly in the background, or manually as a maintenance todo for bot operators.

brightening-eyes commented 5 years ago

regarding logic adapters, it has some pros + some other cons pros:

also, check this out

Mohammad699 commented 2 years ago

Any progress on that?

brightening-eyes commented 2 years ago

it seems that chatterbot is not maintained anymore another, but better idea which came to my mind is to have a model containing an embedding layer and some sort of text similarity detection (to make BestMatch adapter better without something like spacy and so) also a class for transformation of textual data to features for the model and passing a custom model can be proposed to make training that model separate from training chatterbot with this way, the framework used to train the model (tensorflow, pytorch, mxnet) can be used as the preference of the user

Orfeous commented 2 years ago

it seems that chatterbot is not maintained anymore

I agree. However as I think about this, it seems to me that implementing an ML model isn't big of a deal. (you just throw in a custom logic adapter which then implements a huggingface transformer or even a GPT-3 API client).

The real trick is, the evaluation to decide which logic adapter answer should be returned, and also remain consistent (as much as possible)

brightening-eyes commented 2 years ago

the thing is, we can make bestMatch adapter to use deep learning (by using a custom comparison function) and get rid of things like spacy and so on. to generate the response, another logic adapter can be implemented, which should take an encoder and a decoder (seq2seq model) with attention mechanism added in order to generate the responses. but for bots that do custom things like getting weather or reserving hotels, the thing is somehow different with these models. the model for example should get the location and the time of reservation and generate the appropriate response. (these things require a lot of data).