mysamai / natural-brain

A natural language classifier using Node Natural with a BrainJS neural network
MIT License
320 stars 29 forks source link

Huge amount of data #36

Open IonicaBizau opened 6 years ago

IonicaBizau commented 6 years ago

How would this work with huge amount of data (e.g. thousands/millions of pairs), without freezing?

Nice project, btw!

IonicaBizau commented 6 years ago

For instance, my macbook CPU goes to 100% and gets stuck at the training step.

screen shot 2018-01-05 at 12 00 11

Here's my code:

var BrainJSClassifier = require('natural-brain');
var classifier = new BrainJSClassifier();
var lorem = require("lorem-ipsum")

const word = () => lorem({ count: 1, units: "words" })
const cats = new Array(42).fill(0).map(word)
const ran = () => cats[Math.floor(Math.random() * cats.length)]

console.log("Generating")
for (var i = 0; i < 1000; ++i) {
    classifier.addDocument(lorem({ count: 3 }), ran());
}

console.log("Training")
classifier.train();

console.log("Running")
console.log(classifier.classify('hi'));
robertleeplummerjr commented 6 years ago

Lets work together to make this faster!

daffl commented 6 years ago

Training with larger datasets can take a while and the lorem ipsum generator might generate conflicting classifications in which case the Neural Network will run up to 10000 iterations to get the error rate as low as possible (and training might fail if it didn't succeed).

There are two options I can see for improving performance:

1) Train in a separate process so it at least doesn't lock up the main Node process 2) Store and load the trained Neural Network

IonicaBizau commented 6 years ago

Train in a separate process so it at least doesn't lock up the main Node process

Will that work on Heroku, assuming that there is just on CPU? I guess processes will share the same CPU. How would this example look like with multiple processes?

Store and load the trained Neural Network

That sounds good. Some kind of caching is needed anyways, because RAM is limited as well (e.g. 500MB).