mischasan / aho-corasick

A-C implementation in "C". Tight-packed (interleaved) state-transition matrix -- as fast as it gets, as small as it gets.
GNU Lesser General Public License v3.0
147 stars 41 forks source link

aho-corasick

Aho-Corasick parallel string search, using interleaved arrays.

Mischa Sandberg mischasan@gmail.com

ACISM is an implementation of Aho-Corasick parallel string search, using an Interleaved State-transition Matrix. It combines the fastest possible Aho-Corasick implementation, with the smallest possible data structure (!).

FEATURES

DOCUMENTATION

The GoogleDocs description is at http://goo.gl/lE6zG I originally called it "psearch", but found that name was overused by other authors.

LICENSE

LGPL v3

GETTING STARTED

Download the source, type "gmake". "gmake install" exports lib/libacism.a, include/acism.h and bin/acism_x. "acism_x.c" is a good example of calling acism_create and acism_scan/acism_more.

(If you're interested in the GNUmakefile and rules.mk, check my blog posts on non-recursive make, at mischasan.wordpress.com.)

HISTORY

The interleaved-array approach was tried and discarded in the late 70's, because the compile time was O(n^2). acism_create beats the problem with a "hint" array that tracks the restart points for searches. That, plus discarding the original idea of how to get maximal density, resulted in the tiny-fast win-win.

ACKNOWLEDGEMENTS

I'd like to thank Mike Shannon, who wanted to see a machine built to make best use of L1/L2 cache. The change to do that doubled performance on hardware with a much larger cache than the matrix. Go figure.