intel / hyperscan

High-performance regular expression matching library
https://www.hyperscan.io
Other
4.81k stars 718 forks source link

MULTILINE doesn't match CRLF #274

Open BiBi-Abc opened 4 years ago

BiBi-Abc commented 4 years ago

I was testing hyperscan and chimera to match some text. With this regex:

^hello$

and this text (WITH CRLF):

test
hello
testing

there are no matches found. Keep in mind this only occurs with CRLF.

It's compiled using MULTILINE flag.

Here is fully reproducible code:

#include <iostream>
#include "hs.h"
int matchHandler(unsigned int id, unsigned long long from, unsigned long long to, unsigned int flags, void* context)
{
    std::cout << "Matched to " << to << "\n";
    return 0;
}
int main()
{
    hs_database* database = nullptr;
    hs_compile_error* compileError = nullptr;
    hs_compile("^hello$", HS_FLAG_MULTILINE, HS_MODE_BLOCK, nullptr, &database, &compileError);

    hs_scratch* scratch = nullptr;
    hs_alloc_scratch(database, &scratch);

    const char* data = "test\r\nhello\r\ntesting"; // Works switching \r to \n
    hs_scan(database, data, strlen(data), 0, scratch, matchHandler, nullptr);
}

This also happens on Chimera.

I would greatly appreciate it if there was a fix for either of the 2.

xiangwang1 commented 4 years ago

I doubt this is an issue with Hyperscan.

What's your testing environment? I think \r\n is regarded a a newline only on Windows system.

BiBi-Abc commented 4 years ago

I'm using Windows.

rationa1 commented 1 year ago

I have this problem too on centos 7 I hope to deal with it as soon as possible