paulbartrum / jurassic

A .NET library to parse and execute JavaScript code.
MIT License
873 stars 122 forks source link

Regex bug with line endings handling #162

Closed RusKnyaz closed 5 years ago

RusKnyaz commented 5 years ago

Execute the code:

var rheaders = /^(.*?):[ \t]*([^\r\n]*)$/mg;
var headersString = 'X-AspNetMvc-Version: 4.0\r\nX-Powered-By: ASP.NET\r\n\r\n';
var arr = []
while ( match = rheaders.exec( headersString ) ) { 
arr.push(match[1].toLowerCase());
arr.push(match[ 2 ]);
}

Expected: arr is array of ["x-aspnetmvc-version", "4.0", "x-powered-by", "ASP.NET"] Observed: arr is empty.

paulbartrum commented 5 years ago

Looks like the issue is that in .NET the $ character matches \n (in multiline mode):

new Regex("^.*$", RegexOptions.Multiline).Matches("one\r\ntwo\r\n")[0].Value
// returns "one\r"

Whereas in javascript it matches \r or \n:

'one\r\ntwo\r\n'.match(/^.*$/m)[0]
// returns "one"
paulbartrum commented 5 years ago

The workaround is to change your regular expression:

var rheaders = /^(.*?):[ \t]*([^\r\n]*)\r?$/mg;
RusKnyaz commented 5 years ago

Unfortunately I cannot use a workaround because this code is from jquery. And there are probably tons of code on web pages that use regular expressions. Please look at the similar issue in jint

Taritsyn commented 5 years ago

@RusKnyaz For implementation of regular expressions in the Jurassic, Jint and NiL.JS engines are used a System.Text.RegularExpressions.Regex class, which is not fully compatible with ECMAScript (see the “Regular expression parsing error” issue).

RusKnyaz commented 5 years ago

@Taritsyn I kown it. And It is possible to fix. Please look the link I provided in previous comment.

paulbartrum commented 5 years ago

I've checked in a fix, let me know if it works for you :-)

paulbartrum commented 5 years ago

@RusKnyaz You'll notice that my fix is more complicated than the Jint one. It seems the fix in Jint is not correct. 'one\r\ntwo'.match(/^.*$/mg).toString() should return "one,,two" but in Jint it returns "one\r,two".

kpreisser commented 5 years ago

@RusKnyaz You'll notice that my fix is more complicated than the Jint one. It seems the fix in Jint is not correct. 'one\r\ntwo'.match(/^.*$/mg).toString() should return "one,,two" but in Jint it returns "one\r,two".

It seems the fix in Jurassic is also not 100% correct :wink: E.g. ('one\\\r'.match(/^.*\\$/mg) || []).toString() should return "one\\" but returns "".

paulbartrum commented 5 years ago

Ha, you're right, darn it.

paulbartrum commented 5 years ago

I checked in a fix for the escaping issue.