uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
495 stars 53 forks source link

Support incremental evaluation #124

Closed conartist6 closed 2 years ago

conartist6 commented 2 years ago

One of the powerful things about a streaming regex engine is that it can operate over a stream input. I know re2 itself provides APIs for incremental evaluation, but they are not exposed by node-re2 (or if they are, they are not documented. I've not tried using it yet).

I don't really know what would be involved, as I've never written node bindings.

uhop commented 2 years ago

The goal of this project is to emulate JavaScript's RegExp as much as possible with compatible enhancements, not to recreate a C++ project in JS. Having said that I wonder what API you envision to implement this feature?

conartist6 commented 2 years ago

I've already built this in userland actually. My API is:

import { test, exec, execGlobal } from '@iter-tools/regex';

const bool = test(regex, str);
const match = exec(regex, str);
const [...matches] = execGlobal(regex, str);

I provide two extra variations of that API, each supporting those three methods. They are: @iter-tools/regex/async (for consuming async iterators of characters), and @iter-tools/regex/chunked (for async iterators of sync iterators of characters, e.g. file streams). Their methods return a promise or async iterable with results.

conartist6 commented 2 years ago

My design is definitely in conflict with JS though. I really wanted a functional API and JS's implementation does weird things like storing bits of the evaluation state on the Regex instance. Perhaps something like this would make most sense as a second bindings package. Both styles have their uses.

uhop commented 2 years ago

TBH, it looks like a separate project: not everyone needs such an exotic feature.