harthur / classifier

Bayesian classifier with Redis backend
MIT License
624 stars 65 forks source link

Classifier Failing with medium dataset #3

Closed tistaharahap closed 10 years ago

tistaharahap commented 12 years ago

Hi,

First of all, thank you for a really helpful Naive Bayes Classifier library. I tried to implement in Node.js but also having the same problem with your library.

I have a dataset of 1212 sets and 294419 wordsets generated by your library. I used socket.io to handle websocket connections from clients and it always returning result at the first classifying attempt, however when attempting to classify after the first try, it always hangs.

If using a lower quantity of data like the example you posted, it was successful.

To make it clear, here are some codes I used:

io.sockets.on('connection', function(socket) {
    // Classify
    socket.on('classify', function(data) {
        if(typeof data.namespace == 'string' && typeof data.keywords == 'string') {
            console.log('Classifying: ' + data.keywords);
            var start = new Date().getTime();

            bayes.classify(data.keywords, function(cat) {
                console.log('Classified: ' + cat);
                var elapsed = new Date().getTime() - start;

                if(typeof cat == 'string' && cat !== '' && cat != 'unclassified') {
                    var result = {
                        classifyStatus: {
                            code: 200,
                            msg: 'Success',
                            timing: elapsed
                        },
                        text: data.keywords,
                        category: cat
                    };
                    console.log(result);
                    socket.emit('classifyCategory', result);
                }
                else {
                    console.log('Unclassified');
                    socket.emit('classifyCategory', {
                        classifyStatus: {
                            code: 200,
                            msg: 'Cannot classify into any categories',
                            timing: elapsed
                        },
                        text: data.keywords,
                        category: 'unclassified'
                    });
                }
            });
        }
        else {
            console.log('Unclassified - no data from client');
            socket.emit('classifyCategory', {
                trainStatus: {
                    code: 500,
                    msg: 'Failed',
                    timing: 0
                },
                text: data.keywords,
                category: ''
            });
        }
    });
});
harthur commented 11 years ago

Sorry for the late response! Any updates on this? If you take out the socket.io part, does it still fail?

harthur commented 10 years ago

Closing for lack of a reduced case (without the socket.io bits). Reopen if this is still an issue.