uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
479 stars 53 forks source link

exec() on a Buffer returns incorrect index #201

Closed matthewvalentine closed 2 months ago

matthewvalentine commented 7 months ago

Calling exec() multiple times with a Buffer input produces increasingly incorrect values for .index. (Tested on 1.20.8 / commit 4e985f9d13b2820b181df75a912b581a61eab006)

This code:

const RE2 = require('./re2');
const r = new RE2('.', 'g');
const b = Buffer.from('test1test2');
while ((m = r.exec(b))) {
    console.log(m[0].toString(), m.index, r.lastIndex, String.fromCharCode(b[m.index]));
}

produces this output:

t 0 1 t
e 2 2 s
s 4 3 1
t 6 4 e
1 8 5 t
t 10 6
e 12 7
s 14 8
t 16 9
2 18 10

Compared to the same sort of thing with a string input:

t 0 1 t
e 1 2 e
s 2 3 s
t 3 4 t
1 4 5 1
t 5 6 t
e 6 7 e
s 7 8 s
t 8 9 t
2 9 10 2

The .index value in the Buffer input case doesn't seem to correspond to the value returned as the match, and also doesn't line up with the resulting .lastIndex. In fact, .index even starts exceeding the buffer length.