Open chocolateboy opened 9 years ago
hmm, seems like a reasonable expectation. also seems like comment matching or stripping would be beyond the scope of this though. not sure what others think. perhaps we can add something to the readme that suggests that users strip comments first?
also seems like comment matching or stripping would be beyond the scope of this though ... perhaps we can add something to the readme that suggests that users strip comments first?
:+1:
I think the only thing that we should change is first \s*
to \s+
because the
var match2 = re.exec('functionfoo(){}');
use case
I think the only thing that we should change is first
\s*
to\s+
haha, but it's intentional.. lol because the function() {}
case.
@tunnckoCore Off the top of my head, use: (?:\s+[\w$][\w\d$]*)?\s*
E.g. /^function(?:\s+[\w$][\w\d$]*)?\s*\(([\w\s,$]*)\)\s*\{([\w\W\s\S]*)\}$/
@blakeembrey Hmm.. nope exactly. @chocolateboy http://regexr.com/3albo what you think?
More use cases?
Always.. regexes are not enough.
Always.. regexes not enough.
I can't think of any reason why a regex shouldn't work (and correctly handle comments) for ES5 functions, but it won't work for ES6 functions:
A regex will never be able to handle destructured parameters in the general case, since regular languages don't support balanced delimiters.
I was wrong about the "nit". I'd forgotten that JavaScript doesn't provide a way to make .
match newline. Still, [\s\S]
matches everything and [\w\W]
matches everything, so only one is needed.
Something like this should work (though note the parameter list includes comments):
^ function (?: space+ (ident) )? space* \( space* (params)? space* \) space* \{ (body) \} $
e.g.:
function /* 1 */ test /* 2 */ ( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {
return foo;
}
function str (re) {
return re.toString().slice(1, -1);
}
var fn = str(/^ function (?: space+ (ident) )? space* \( space* (params)? space* \) space* \{ (body) \} $ /);
var params = str(/ ident (?: space* , space* ident )* /);
var space = str(/ (?: (?: \s+ ) | (?: \/ \* [\s\S]*? \* \/ ) | (?: \/\/ [^\r\n]* ) ) /);
var ident = str(/ (?: [a-zA-Z_$] [\w$]* ) /);
var body = str(/ (?: [\s\S]*? ) /);
fn = fn.replace(/params/g, params)
.replace(/space/g, space)
.replace(/ident/g, ident)
.replace(/body/g, body)
.replace(/\s+/g, '');
var re = new RegExp(fn);
console.log("test: %j", test.toString());
console.log(re);
console.log(re.exec(test.toString()))
test: "function test( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {\n return foo;\n}"
/^function(?:(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))+((?:[a-zA-Z_$][\w$]*)))?(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\((?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*((?:[a-zA-Z_$][\w$]*)(?:(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*,(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*(?:[a-zA-Z_$][\w$]*))*)?(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\)(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\{((?:[\s\S]*?))\}$/
[ 'function test( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {\n return foo;\n}',
'test',
'foo /* 4 */ , /* 5 */ bar',
'\n return foo;\n',
index: 0,
input: 'function test( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {\n return foo;\n}' ]
Hm. Looks good, but what about /* 3 */
and /* 6 */
comments? If we will accept comments and include them in response why only these in the middle?
what about / 3 / and / 6 / comments?
The regex originally included them. Put them back in if you want them :-)
var fn = str(/^ function (?: space+ (ident) )? space* \( (params)? \) space* \{ (body) \} $ /);
var params = str(/ space* ident (?: space* , space* ident )* space* /);
[ 'function test( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {\n return foo;\n}',
'test',
' /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ',
'\n return foo;\n',
index: 0,
input: 'function test( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {\n return foo;\n}' ]
Hm, good but I keep thinking of that
also seems like comment matching or stripping would be beyond the scope of this though
we already have libs for stripping comments, so its normal first to strip them after that using this regex.
I dont know, others @regexhq/owners ?
but we should only handle this case 'functionfoo(){}'
but we should only handle this case 'functionfoo(){}'
That's already handled by this regex, as well as other fixes e.g.:
// invalid identifiers: shouldn't match, but does
require('function-regex')().exec("function 1 (2, 3, 4) { }")
[ 'function 1 (2, 3, 4) { }',
'1',
'2, 3, 4',
' ',
index: 0,
input: 'function 1 (2, 3, 4) { }' ]
// invalid identifiers: doesn't match
re.exec("function 1 (2, 3, 4) { }")
null
At any rate, this approach is designed to be easy to read, understand and modify, which is more than can be said for hacking on the regex directly. If you're happy for this regex to break on comments, you can just remove them from the space
pattern:
var space = str(/ (?: \s+ ) /);
Still,
[\s\S]
matches everything and[\w\W]
matches everything, so only one is needed.
I noticed that as well.
Before we go to far down this path, what are the use cases we're discussing? IMHO the regex should do what's necessary to meet the requirements of the lib. is this going to be used in esprima or acorn? is the penalty of stripping comments first too much for this too be useful? or should it be used for fast parsing for code context?
I'm not really nitpicking on this, I think the dialog is great. The bigger point is that the lib should have a well-defined purpose. that will answer your questions
is this going to be used in esprima or acorn?
? Why would it be used in a parser? Parsers are tools for doing this correctly in all cases. This is a tool for doing it quickly in cases where a) it's sufficient (i.e. ES5 functions) and b) the documented exceptions (e.g. not supporting comments) don't matter.
right, that's my point
I use it in parse-function and I think we should not handle cases with comments and add notice in the readme to strip them before use the regex. Because if we handle these cases.. the regex comes too big for me.
btw function-to-string
is awesome name.. when it transform string (function as string) to object LOL!
I use it in parse-function
There's no harm in the name, but that's not the kind of parser I was referring to. I wouldn't expect Acorn or Esprima to silently choke on such a fundamental feature of valid/real-world JavaScript as comments :-)
btw
function-to-string
is awesome name.. when it transform string (function as string) to object LOL!
There are several issues with the regex:
functionfoo () { }
to be a valid function declarationNit:\{([\w\W\s\S]*)\}
is a roundabout way of saying\{(.*)\}
(though I'd go with\{(.*?)\}
)test
output: