uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
497 stars 52 forks source link

Add `RE2.test(input, offset, length)` #216

Open ronag opened 1 month ago

ronag commented 1 month ago

Could avoid the need to create subarrays when matching

i.e.

re2.test(buffer.subarray(1,3)) vs re2.test(buffer, 1, 2)

uhop commented 1 month ago

I am not sure we have a use case here: buf.subarray([start[, end]]) "returns a new Buffer that references the same memory as the original, but offset and cropped by the start and end indexes". It is my understanding that subarray() creates a view, no byte copying is involved. I assume it is a very fast operation.

Thoughts?

ronag commented 1 month ago

subarray is quite slow. I've spent a lot of time removing sub array creation in node core due to this.

uhop commented 1 month ago

That is strange. Conceptually it should be a pointer and two numbers like in your proposal. Would it help if I create such object in the library, so it can be used consistently in all methods instead of a string/buffer or do you expect it to be too expensive performance-wise?

ronag commented 1 month ago

It super expensive to create typed arrays since it does a lot of checks. Creating a different abstraction is also an option. Internall we use something like this for creating "slices" of buffers:

class Slice {
  buffer: Buffer,
  byteOffset: number,
  byteLength
}

Possibly relevant: https://issues.chromium.org/issues/42210394

uhop commented 1 month ago

Your Slice was exactly what I was thinking about.