logdna / hyperscan

This node module provides C bindings for Intel's Hyperscan library
BSD 3-Clause "New" or "Revised" License
9 stars 4 forks source link

Weird behavior on RegEx matching #2

Closed dpnishant closed 5 years ago

dpnishant commented 5 years ago

While I was just starting to play with the Hyperscan for Node.JS, I noticed a strange behavior.

When I start to run the the following code (slightly changed from the original index.js

const addon = require('./build/Release/hyperscan');
module.exports.HyperscanDatabase = addon.HyperscanDatabase;
let db = new addon.HyperscanDatabase(['stuf.?'], [{
    HS_FLAG_SOM_LEFTMOST: true
    , singleMatch: true
}]);
console.log(db.scan('some stuff over here test anything goes here and stuf'));

I got the following output:

[ { patternId: 0, offsetStart: 0, offsetEnd: 9 },
  { patternId: 0, offsetStart: 0, offsetEnd: 10 },
  { patternId: 0, offsetStart: 0, offsetEnd: 53 } ]

However testing the same regex pattern on the same test string I got the following output as shown in the screenshot below:

RegEx101 Screenshot

Can anyone explain what's going on?

jakedipity commented 5 years ago

This has to do with the way hyperscan matches - it matches everything, not just the longest match. Hyperscan works a little different than what you'd expect from most common flavors of regular expression engines, and if you want to learn more about it you should check out their repo: https://github.com/intel/hyperscan

Also, this project isn't going to be supported by us and even the current code was haphazardly put together. You are more than welcome to fork your own project, but we may end up deleting this repo in the near future.

dpnishant commented 5 years ago

@jakedipity Thanks for the heads up. I was wondering if you guys can archive/deprecate the project instead of deleting the repo. Many people might still benefit from the great work been put into this project. :-)