olddaos / node-ewah

Fast and concise bitvectors ( either sparse or dense ), based on Lemur Project."
5 stars 0 forks source link

node-ewah

Simple, fast bitmap vectors ( sparse and dense ) and indices implementation, based on Lemur project: (http://code.google.com/p/lemurbitmapindex/)

Features:

Getting Started

Install it in node.js :

npm install node-ewah

Then, start to play with:

var BitVector = require('node-ewah');

Then create some vectors ...


var vGoods    = new BitVector();
var vSpecies  = new BitVector();

// This is the bitvector of docids, that contain the word "goods". Bit number #xxx is on, if and only if document number #xxx contains word "goods".
vGoods.push(10);
vGoods.push(1234);
vGoods.push(101234);

// This is the bitvector of docids, that contain the word "species". Bit is the for the same reason.
vSpecies.push(1234);
vSpecies.push(1239);
vSpecies.push(10000);

// Now lets know, what documents contains first or second term
// This operation is performed using native machine words, that contain bitmap fragments, and so is extremely fast

var vMergedVector = vGoods.rawor( vSpecies );

// Check, that things are retrieved fine...
var checkVector = vGoods.map( function(x) { console.log( "From bitvector vGoods: " + x ); } );

// Check merging result...
var checkMerged = vMergedVector.map( function(x) { console.log( "From bitvector vMerged: " + x ); } );

API

push( bit_id )

Appends bit_id to the bitmap. Sorry, no random access today, but development still going hard...

map( ... )

This thing is the same as in plain vanilla Array

join( delimiter )

Returns catenated values of nonzero bits, interspersed with specified delimiter. Catenation is performed using C++ methods, i.e. new strings are not spawned

toString( delimiter )

Same as above, but no delimiter is allowed :-) Pretty standard behavior...

rawor( vectorRight )

Applies logical OR operation on This and vectorRight vectors. Result is new vector or undef if things were wrong. Does not account for sparsity ( see. sparseor for that ), thus it could be slower, when applied to sparse vectors

rawand( vectorRight )

Applies logical AND operation on This and vectorRight vectors. Result is new vector or undef if things were wrong

population( )

Returns the number of nonzero bits ( population count ). Such a long name is to avoid confusion with pop ( which is Array-specific method )

length( )

Returns compressed size ( IN BITS! ).

Testing

In node.js

> node ewah.js

Benchmarking

Requires node.js

Release notes

v.1.0 was released