svaarala / duktape

Duktape - embeddable Javascript engine with a focus on portability and compact footprint
MIT License
5.9k stars 514 forks source link

Duktape fails to parse a regexp accepted by chrome #425

Open kinnison opened 8 years ago

kinnison commented 8 years ago

Embedded in some JS which deals with CSS, we have encountered the following regular expression:

/:((?:[\w\u00c0-\uFFFF_-]|\\.)+)(?:\((['"]*)((?:\([^\)]+\)|[^\2\(\)]*)+)\2\))?/

Chrome accepts this happily, but duktape complains of an invalid decimal escape.

This prevents some sites from loading their javascript frameworks.

Any help would be gratefully received.

svaarala commented 8 years ago

This is one of those real world regexp leniency issues: the issue is probably [^\2] which is not allowed in standard Ecmascript:

Longer term I don't think it's feasible for Duktape's regexp engine to simultaneously be low footprint and support non-standard regexp idioms, so the two alternatives are:

I'm planning to start with the second approach, as soon as I get some time to work on it :-)

fatcerberus commented 8 years ago

With the number of real world regexps I've seen Duktape reject, it actually amazes me that something like NetSurf is feasible. It makes me think the leniency issue isn't as big a deal as it seems at first.

danielsilverstone-ct commented 8 years ago

@fatcerberus As it stands, our efforts are not resulting in something compatible with what's out there; but we're working hard to make it go. As @svaarala builds more and more possibility into duktape, so our efforts go further :-)

jfahrenkrug commented 8 years ago

I was about to open a new issue, but I think it's better to just comment on this one.

I have a project built with WebPack that uses the XRegExp npm package. Duktape had problems with these two regular expressions:

/\[(\^?)]/

and

/(\()(?!\?)|\\([1-9]\d*)|\\[\s\S]|\[(?:[^\\\]]|\\[\s\S])*]/g

In both cases the issue is that the regex wants to match a literal ], but it doesn't escape it. So technically the regex is invalid.

When I change the first one to

 /\[(\^?)\]/

and the second one to

/(\()(?!\?)|\\([1-9]\d*)|\\[\s\S]|\[(?:[^\\\]]|\\[\s\S])*\]/g

Duktape is happy :)

svaarala commented 8 years ago

@jfahrenkrug Support for literal curlies was added a while back - I could look into adding support for literal brackets too. They don't appear in unescaped form as commonly as the curly braces but still pop up from time to time :)

svaarala commented 8 years ago

@jfahrenkrug #871; the [ matching is more complicated because it needs actual lookahead, but accepting a literal closing bracket ] should be trivial.

jfahrenkrug commented 8 years ago

@svaarala That's great news, thank you! Meanwhile I've opened a PR for XRegExp: https://github.com/slevithan/xregexp/pull/141

livrish commented 6 years ago

Hi

when we use the selfDefending on javascript-obfuscator the regex chokes also

function outputMyNumber(showThis) { console.log(showThis); } function hello() { var getNumber = function (b) {return b+ 3; } var thenumber = getNumber(2); outputMyNumber(thenumber); } hello()

obfuscator: { options: { compact: true, controlFlowFlattening: false, deadCodeInjection: false, debugProtection: false, debugProtectionInterval: false, disableConsoleOutput: false, identifierNamesGenerator: 'hexadecimal', log: true, renameGlobals: false, rotateStringArray: true, selfDefending: true, stringArray: true, stringArrayEncoding: 'base64', stringArrayThreshold: 0.75, unicodeEscapeSequence: false, target: 'browser' },

function outputMyNumber(_0x393fc5){console'log';}function hello(){var _0x246e30=function(){var _0xb2ee37=!![];return function(_0x2a1c07,_0x39a818){var _0x47081b=_0xb2ee37?function(){if(_0x39a818){var _0x461e24=_0x39a818'apply';_0x39a818=null;return _0x461e24;}}:function(){};_0xb2ee37=![];return _0x47081b;};}();var _0x78fdcf=function(_0x408fe3){var _0x1a1f63=_0x246e30(this,function(){var _0x49075f=function(){return'\x64\x65\x76';},_0x57969d=function(){return'\x77\x69\x6e\x64\x6f\x77';};var _0x23f862=function(){var _0x45fb57=new RegExp('\w+\s(){\w+\s['|"].+['|"];?\s}');return!_0x45fb57'\x74\x65\x73\x74';};var _0x532ab5=function(){var _0x130be7=new RegExp('\x28\x5c\x5c\x5b\x78\x7c\x75\x5d\x28\x5c\x77\x29\x7b\x32\x2c\x34\x7d\x29\x2b');return _0x130be7'\x74\x65\x73\x74';};var _0x4e7fde=function(_0x47bb06){var _0x35aeea=-0x1>>0x1+0xff%0x0;if(_0x47bb06'\x69\x6e\x64\x65\x78\x4f\x66'){_0x54673c(_0x47bb06);}};var _0x54673c=function(_0x192be0){var _0x2eb20c=-0x4>>0x1+0xff%0x0;if(_0x192be0'\x69\x6e\x64\x65\x78\x4f\x66'!==_0x2eb20c){_0x4e7fde(_0x192be0);}};if(!_0x23f862()){if(!_0x532ab5()){_0x4e7fde('\x69\x6e\x64\u0435\x78\x4f\x66');}else{_0x4e7fde('\x69\x6e\x64\x65\x78\x4f\x66');}}else{_0x4e7fde('\x69\x6e\x64\u0435\x78\x4f\x66');}});_0x1a1f63();return _0x408fe3+0x3;};var _0x333e8b=_0x78fdcf(0x2);outputMyNumber(_0x333e8b);}hello();