Closed haukex closed 2 years ago
This is a tough one. Yes, we can allow eval STRING
to be parsed. And yes, we can then enforce it after parsing using a policy. And it is also true that we are making a judgment call here.
However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.
To view this from each component's perspective:
Standard Perl's syntax definition intends to be both clearer and simpler, which is why eval STRING
is not allowed. Other than supporting code which currently has it, do you have a case in which you need eval STRING
.
Guacamole's specification is the syntax for Standard Perl. At the moment, I don't wish to change this.
Now, if the assumption that all eval STRING
s can be easily replaced is no longer valid, we should revisit this issue. Can you suggest such a situation?
I avoid eval STRING in production code as much as my imagination allows, and here are the reasons I use it in Marpa::R2.
1.) Paranoia about $VERSION conversions:
use vars qw($VERSION $STRING_VERSION);
$VERSION = '8.000000';
$STRING_VERSION = $VERSION;
## no critic(BuiltinFunctions::ProhibitStringyEval)
$VERSION = eval $VERSION;
## use critic
2.) Getting the result of a run-time "require":
if ( not eval "require $package" ) {
3.) A "developer-safe" coding hack:
$escape_by_ord[ ord eval qq{"$_"} ] = $_
for "\\t", "\\r", "\\f", "\\b", "\\a", "\\e";
The forgoing is developer safe because each of the characters \t, etc. need only be specified once, making it unlikely that yours truly will specify them inconsistently. :-)
I avoid eval STRING in production code as much as my imagination allows, and here are the reasons I use it in Marpa::R2.
1.) Paranoia about $VERSION conversions:
use vars qw($VERSION $STRING_VERSION); $VERSION = '8.000000'; $STRING_VERSION = $VERSION; ## no critic(BuiltinFunctions::ProhibitStringyEval) $VERSION = eval $VERSION; ## use critic
Gaucamole has very strict version numbers, but I don't think it helps considering people use these versions in strings because then they need to be eval
'ed.
The only way to deal with this, I guess, is requiring module versions not be set in strings and then it can be parsed by both Perl and Guacamole natively.
I think this does expose an interesting perspective of people using code (and values) which are not being natively parsed by Perl and run another Perl instead to then parse them. It's hard to see this as a good practice, though.
2.) Getting the result of a run-time "require":
if ( not eval "require $package" ) {
When I can't avoid runtime loading of a module name in a string, I use Module::Runtime
. Maybe we shouldn't assume on people being able to use modules for runtime loading of variable module names?
3.) A "developer-safe" coding hack:
$escape_by_ord[ ord eval qq{"$_"} ] = $_ for "\\t", "\\r", "\\f", "\\b", "\\a", "\\e";
The forgoing is developer safe because each of the characters \t, etc. need only be specified once, making it unlikely that yours truly will specify them inconsistently. :-)
I can understand this use, but it's not a big concern for me.
It's definitely a helpful trick. I hope the following is readable enough to avoid doing it with eval STRING
:
$escape2[ ord $_->[0] ] = $_->[1]
for
[ "\t" => '\t' ],
[ "\r" => '\r' ],
[ "\f" => '\f' ],
[ "\b" => '\b' ],
[ "\a" => '\a' ],
[ "\e" => '\e' ];
I'm still undecided if to allow eval STRING
, to be honest.
However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.
Sure, that makes sense. I am happy this project exists, because I think a Perl that is parseable by more than just perl
opens a lot of doors, so thank you for your efforts! And I don't want to seem like I'm complaining too much ;-)
I have a largeish (for one author) codebase that I'm considering porting to standard.pm, and one of the things I really miss in standard.pm is hash key autoquoting. I could of course fork Guacamole, but that's probably the least favorable option this early on, cooler would be if there were an extension mechanism built in (so I could make my own use standard::HAUKEX;
), or, even cooler would be if Guacamole already supported these things but standard.pm disallowed it based on policy, and that policy is configurable to make people like me happy :-) That would of course require the line between Guacamole and standard.pm to be drawn more clearly. I am curious what solution you would lean towards in this situation? I may even have some time to volunteer to look into implementing it. (I'm currently guessing that an extension mechanism might be easiest in the short term. Edit: Actually, looks like that's #23?)
Standard Perl's syntax definition intends to be both clearer and simpler, which is why
eval STRING
is not allowed.
That would seem like an argument for disallowing s///ee
as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing $_="";s// ... /ee;
instead...
Other than supporting code which currently has it, do you have a case in which you need
eval STRING
.
I did a quick grep of my codebase and I use it a bit in testing, I have a few instances of eval "use $modulename; 1"
and similar, and one interesting case of eval($Config{nv_overflows_integers_at})
.
Also, I think templating engines make heavy use of eval STRING
.
I personally don't need eval STRING
, for those cases I do I could probably work around its absence, but it's more the thought that do
, require
, and s///ee
are allowed but eval STRING
is not that confused me. IMHO, either one disallows all kinds of dynamic code compliation (which would be helpful for security-senstive situations, though of course use
would still need to be allowed, and that's really just a BEGIN { eval STRING }
), or one allows them all.
However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.
Sure, that makes sense. I am happy this project exists, because I think a Perl that is parseable by more than just
perl
opens a lot of doors, so thank you for your efforts! And I don't want to seem like I'm complaining too much ;-)
Don't dare worry that you're "complaining too much." :)
Your feedback is very valuable for making the project both better and more production-ready.
I have a largeish (for one author) codebase that I'm considering porting to standard.pm, and one of the things I really miss in standard.pm is hash key autoquoting.
That is one of the hard-stops in Guacamole because the quoting is a bit subtle and a thorny problem for beginners. Like many things, it's great until you hit the edge-cases:
$foo{bar} = "foo"; # auto-quoted
$foo{
bar
} = "foo"; # not auto-quoted
$foo{
caller} = "foo"; # also, not auto-quoted and runs caller()
$foo{+bar} = -foo; # left-side isn't auto-quoted, right-side is - what a mess
(For the record, I've hit those cases and it was not fun.)
I started writing a tool for adding quotes to code so I wouldn't have to do it myself. This is what I got so far.
I could of course fork Guacamole, but that's probably the least favorable option this early on, cooler would be if there were an extension mechanism built in (so I could make my own
use standard::HAUKEX;
), or, even cooler would be if Guacamole already supported these things but standard.pm disallowed it based on policy, and that policy is configurable to make people like me happy :-) That would of course require the line between Guacamole and standard.pm to be drawn more clearly. I am curious what solution you would lean towards in this situation? I may even have some time to volunteer to look into implementing it. (I'm currently guessing that an extension mechanism might be easiest in the short term. Edit: Actually, looks like that's #23?)
Standard Perl's syntax definition intends to be both clearer and simpler, which is why
eval STRING
is not allowed.That would seem like an argument for disallowing
s///ee
as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing$_="";s// ... /ee;
instead...
I don't fully understand this one, sorry.
Other than supporting code which currently has it, do you have a case in which you need
eval STRING
.I did a quick grep of my codebase and I use it a bit in testing, I have a few instances of
eval "use $modulename; 1"
and similar, and one interesting case ofeval($Config{nv_overflows_integers_at})
.
Why would you have to eval
that one? What is interesting about it? (Sorry, I'm not familiar with that key.)
Also, I think templating engines make heavy use of
eval STRING
.I personally don't need
eval STRING
, for those cases I do I could probably work around its absence, but it's more the thought thatdo
,require
, ands///ee
are allowed buteval STRING
is not that confused me. IMHO, either one disallows all kinds of dynamic code compliation (which would be helpful for security-senstive situations, though of courseuse
would still need to be allowed, and that's really just aBEGIN { eval STRING }
), or one allows them all.
That's fair. What I'm worried about is adding to the syntax definition something that - from the start - has a label of "please don't even do this. You really, really, really shouldn't use this unless you're doing some very low-level stuff." But I guess it might be impossible to entirely avoid it in the parser.
In my list of uses of eval STRING in Marpa::R2, I omitted the uses in my test suite. Actually, testing is essential, and eval STRING does prove very handy in testing.
the quoting is a bit subtle and a thorny problem for beginners.
Yes, I get that (I even wrote about it myself on PerlMonks). To separate the two topics:
Hash subscripts: My understanding is that Standard Perl doesn't accept barewords anywhere (except the STD*
filehandles), is that correct? In that case, wouldn't it be technically fairly easy to restrict hash keys to either an Expression
, or to a VarIdent
bareword with no whitespace between the braces? I totally understand helping newcomers, but as you can tell I'm also very interested in the technical aspects of this grammar, since it's a possibility to finally support things that couldn't be truly supported before: proper syntax highlighting, safe source filtering, safe refactoring, proper source code analysis, even source-to-source translation, and so on. Lowering the cost of porting existing codebases (and brains that aren't used to quoting everything) to Standard Perl would be a great advantage, IMHO, and help to newcomers could then be provided using policies similar to what Perl::Critic does.
Fat comma: This one is actually more important to me personally than the hash subscripts, since it's basically "the" way to write named parameters in Perl, and those names usually match the VarIdent
rule (or they begin with a dash, hence #109). For example, IMHO the quotes in has "logger" => ( "is"=>'ro', "required"=>0, "isa"=>InstanceOf['MyLogger'], "lazy"=>1, "builder"=>sub {...} )
add a lot of noise. However, I assume it's technically more tricky to avoid the pitfalls here, since there are no braces to delimit the LHS of the fat comma.
Anyway, getting a little off topic :-)
23 is indeed the idea of doing it. Unfortunately, I couldn't yet come up with the mechanism. If you have any thoughts or would like to help with this, I'd be more than happy to go into it together and see what's possible.
Sure! Currently, I'm approaching this from the viewpoint of my use case, i.e. how I could port my codebase to a Perl that is statically parseable, while still allowing me to keep some of my personal style (like e.g. hash key autoquoting, and I would love to get certain sub calls without parens back, if I can think of a safe way to implement that). I will think about this some more, and I'll look at the suggested extension mechanism, but since I'm currently working on a bunch of different things, I'm currently operating on somewhat longer time scales (as evidenced by my delayed reply) - I'll of course report back once I have something interesting.
Why would you have to eval that one? What is interesting about it? (Sorry, I'm not familiar with that key.)
Its value looks like "256.0*256.0*256.0*256.0*256.0*256.0*2.0*2.0*2.0*2.0*2.0"
, so it needs the eval
to get the value (9007199254740992). As per this: All integer values from 0 to this number can be stored without loss as an NV; some larger integers can be stored without loss in an NV, but not the one 1 higher than this.
That would seem like an argument for disallowing
s///ee
as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing$_="";s// ... /ee;
instead...I don't fully understand this one, sorry.
Yes, sorry, the second sentence is just a scenario in my imagination :-) The main question is: You said "Standard Perl's syntax definition intends to be both clearer and simpler", then why is s///ee
still allowed when eval STRING
isn't? eval("perl code")
looks like a simple function call while s///ee
s are in my experience hard for many coders to even wrap their head around.
So I'm basically just saying: IMHO, either disallow s///ee
, do FILE
, and eval STRING
, or allow all of them. (I omitted require FILENAME
from that list since that's probably still used too often, but if one were being really strict about this, then require
should be disallowed entirely too, leaving only use
.)
I've just pushed support for eval STRING
as well as eval $var
. (Actually, it's Eval Value, so a few different stuff is supported.)
the quoting is a bit subtle and a thorny problem for beginners.
Yes, I get that (I even wrote about it myself on PerlMonks). To separate the two topics:
Hash subscripts: My understanding is that Standard Perl doesn't accept barewords anywhere (except the
STD*
filehandles), is that correct?
This is correct.
In that case, wouldn't it be technically fairly easy to restrict hash keys to either an
Expression
, or to aVarIdent
bareword with no whitespace between the braces?
I think that might be possible (currently, we're managing a lot of whitespace rules through Marpa, but there's a ticket for manually handling all whitespace. I'm not sure it's worth the effort and initial picking at it failed miserably.
I think we're now entering user behavior: If $foo{bar}
works, why shouldn't $foo{ bar }
? Perl certainly sees it the same way. I'm cautious about creating rules that say "it's like this, except for this, unless that." There are also issues with unary operations that cause a lack of quoting or remain part of quoted strings: If $foo{bar}
is autoquoted, why not $foo{-bar}
? If this is allowed, why not $foo{+bar}
? And we just entered breakage because +
is a unary op.
By having strings always be strings, the entire issue is avoided. If you come from another language, strings being quoted is not an unreasonable thing.
I totally understand helping newcomers, but as you can tell I'm also very interested in the technical aspects of this grammar, since it's a possibility to finally support things that couldn't be truly supported before: proper syntax highlighting, safe source filtering, safe refactoring, proper source code analysis, even source-to-source translation, and so on. Lowering the cost of porting existing codebases (and brains that aren't used to quoting everything) to Standard Perl would be a great advantage, IMHO, and help to newcomers could then be provided using policies similar to what Perl::Critic does.
I think you hit an edge case where we could probably (phrasing it cautiously here) but we would create a situation that I think adds a burden of difference between Guacamole and Perl (and other parts of Guacamole) that might cause more confusion than relief.
Fat comma: This one is actually more important to me personally than the hash subscripts, since it's basically "the" way to write named parameters in Perl, and those names usually match the
VarIdent
rule (or they begin with a dash, hence #109). For example, IMHO the quotes inhas "logger" => ( "is"=>'ro', "required"=>0, "isa"=>InstanceOf['MyLogger'], "lazy"=>1, "builder"=>sub {...} )
add a lot of noise. However, I assume it's technically more tricky to avoid the pitfalls here, since there are no braces to delimit the LHS of the fat comma.
LHS of fat comma is not a fun thing to tackle and includes edge-cases just the same.
Anyway, getting a little off topic :-)
23 is indeed the idea of doing it. Unfortunately, I couldn't yet come up with the mechanism. If you have any thoughts or would like to help with this, I'd be more than happy to go into it together and see what's possible.
I just pushed a commit that adds methods for extending the grammar, specifically with raw lexemes and with keywords. Here is a sample addition of Try::Tiny
-like try {} catch {}
syntax:
Guacamole->add_keyword(
'Catch',
'unary',
'catch',
[ 'OpKeywordCatch BlockNonEmpty' ]
);
Guacamole->add_keyword(
'Try',
'list',
'try',
[
'OpKeywordTry BlockNonEmpty OpKeywordCatchExpr',
'OpKeywordTry BlockNonEmpty',
],
);
Sure! Currently, I'm approaching this from the viewpoint of my use case, i.e. how I could port my codebase to a Perl that is statically parseable, while still allowing me to keep some of my personal style (like e.g. hash key autoquoting, and I would love to get certain sub calls without parens back, if I can think of a safe way to implement that). I will think about this some more, and I'll look at the suggested extension mechanism, but since I'm currently working on a bunch of different things, I'm currently operating on somewhat longer time scales (as evidenced by my delayed reply) - I'll of course report back once I have something interesting.
I'm willing to be more flexible to help Guacamole be supported. I've updated it to support eval Value
but I don't think I can see my way through autoquoting. Can't say I never will, but I haven't yet been convinced of it.
Why would you have to eval that one? What is interesting about it? (Sorry, I'm not familiar with that key.)
Its value looks like
"256.0*256.0*256.0*256.0*256.0*256.0*2.0*2.0*2.0*2.0*2.0"
, so it needs theeval
to get the value (9007199254740992). As per this: All integer values from 0 to this number can be stored without loss as an NV; some larger integers can be stored without loss in an NV, but not the one 1 higher than this.
This should now be supported with eval Value
.
That would seem like an argument for disallowing
s///ee
as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing$_="";s// ... /ee;
instead...I don't fully understand this one, sorry.
Yes, sorry, the second sentence is just a scenario in my imagination :-) The main question is: You said "Standard Perl's syntax definition intends to be both clearer and simpler", then why is
s///ee
still allowed wheneval STRING
isn't?eval("perl code")
looks like a simple function call whiles///ee
s are in my experience hard for many coders to even wrap their head around.So I'm basically just saying: IMHO, either disallow
s///ee
,do FILE
, andeval STRING
, or allow all of them. (I omittedrequire FILENAME
from that list since that's probably still used too often, but if one were being really strict about this, thenrequire
should be disallowed entirely too, leaving onlyuse
.)
You make a good argument. I hope the eval Value
support will serve you well. :)
As kind of a continuation of #106, this is a bit more philosophical. In that issue, I assumed that even though Guacamole should be capable of parsing
eval("code")
, it was chosen not to support it becauseeval EXPR
should not be part ofstandard.pm
.If that's the case, then IMHO
s///ee
should not be supported either (and as a related issue, couldn'ts///e
parse the replacement part accordingly?).However, I had a different thought: Since I'm guessing
eval STRING
is not supported for policy reasons, and not for technical reasons, would it make sense to implement it differently in Guacamole? That is, alloweval("code")
to be parsed, but then reject it after parsing as a matter of policy (similar to Perl::Critic's ProhibitStringyEval)? This would allow Guacamole's parser to be used for projects that do want to supporteval
, i.e. to adjust the policies.And as another side note: I can also work around the missing support for
eval EXPR
withdo FILE
...