winkjs / wink-naive-bayes-text-classifier

Naive Bayes Text Classifier
http://winkjs.org/wink-naive-bayes-text-classifier/
MIT License
39 stars 9 forks source link
chatbot classifier machine-learning naive-bayes natural-language-processing nlp sentiment-analysis text-classification winkjs winknlp

wink-naive-bayes-text-classifier

Naive Bayes Text Classifier

Build Status Coverage Status Gitter

Classify text, analyse sentiments, recognize user intents for chatbot using wink-naive-bayes-text-classifier. Its API offers a rich set of features including cross validation to compute confusion matrix, precision, and recall. It delivers impressive accuracy levels with right text pre-processing using wink-nlp:

Dataset Accuracy
Sentiment Analysis
Amazon Product Review Sentiment Labelled Sentences Data Set at UCI Machine Learning Repository
90%
800 training examples & 200 validation reviews

Refer to sentiment-analysis-example directory for the reference code
Intent Classification
Chatbot corpus from NLU Evaluation Corpora as mentioned in paper titled Evaluating Natural Language Understanding Services for Conversational Question Answering Systems
99%
100 training examples & 106 validation

Refer to chatbot-example directory for the reference code

Text Pre-processing

A winkNLP based helper function for general purpose text pre-processing is available that (a) tokenizes, (b) removes punctuations, symbols, numerals, URLs, stop words, (c) stems each token and (d) handles negations. It can be required from wink-naive-bayes-text-classifier/src/prep-text.js. WinkNLP's Named Entity Recognition may be used to further enhance the pre-processing.

Hyperparameters

These include smoothing factor to control additive smoothing and a consider presence only flag to choose from Multinomial/Binarized naive bayes.

The trained model can be exported as JSON and can be reloaded later for predictions.

Installation

Use npm to install:

npm install wink-naive-bayes-text-classifier --save

It requires Node.js version 16.x or 18.x.

Example


// Load Naive Bayes Text Classifier
var Classifier = require( 'wink-naive-bayes-text-classifier' );
// Instantiate
var nbc = Classifier();
// Load wink nlp and its model
const winkNLP = require( 'wink-nlp' );
// Load language model
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;

const prepTask = function ( text ) {
  const tokens = [];
  nlp.readDoc(text)
      .tokens()
      // Use only words ignoring punctuations etc and from them remove stop words
      .filter( (t) => ( t.out(its.type) === 'word' && !t.out(its.stopWordFlag) ) )
      // Handle negation and extract stem of the word
      .each( (t) => tokens.push( (t.out(its.negationFlag)) ? '!' + t.out(its.stem) : t.out(its.stem) ) );

  return tokens;
};
nbc.definePrepTasks( [ prepTask ] );
// Configure behavior
nbc.defineConfig( { considerOnlyPresence: true, smoothingFactor: 0.5 } );
// Train!
nbc.learn( 'I want to prepay my loan', 'prepay' );
nbc.learn( 'I want to close my loan', 'prepay' );
nbc.learn( 'I want to foreclose my loan', 'prepay' );
nbc.learn( 'I would like to pay the loan balance', 'prepay' );

nbc.learn( 'I would like to borrow money to buy a vehicle', 'autoloan' );
nbc.learn( 'I need loan for car', 'autoloan' );
nbc.learn( 'I need loan for a new vehicle', 'autoloan' );
nbc.learn( 'I need loan for a new mobike', 'autoloan' );
nbc.learn( 'I need money for a new car', 'autoloan' );
// Consolidate all the training!!
nbc.consolidate();
// Start predicting...
console.log( nbc.predict( 'I would like to borrow 50000 to buy a new Audi R8 in New York' ) );
// -> autoloan
console.log( nbc.predict( 'I want to pay my car loan early' ) );
// -> prepay

Try experimenting with this example on Runkit in the browser.

Documentation

Check out the Naive Bayes Text Classifier API documentation to learn more.

Need Help?

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.

About wink

Wink is a family of open source packages for Natural Language Processing, Statistical Analysis and Machine Learning in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.

Copyright & License

wink-naive-bayes-text-classifier is copyright 2017-24 GRAYPE Systems Private Limited.

It is licensed under the terms of the MIT License.