xsawyerx / guacamole

Guacamole is a parser toolkit for Standard Perl. It provides fully static BNF-based parsing capability to a reasonable subset of Perl.
https://metacpan.org/pod/Guacamole
20 stars 8 forks source link

s///ee parses, but eval("") does not, and related topics #108

Closed haukex closed 2 years ago

haukex commented 3 years ago

As kind of a continuation of #106, this is a bit more philosophical. In that issue, I assumed that even though Guacamole should be capable of parsing eval("code"), it was chosen not to support it because eval EXPR should not be part of standard.pm.

If that's the case, then IMHO s///ee should not be supported either (and as a related issue, couldn't s///e parse the replacement part accordingly?).

However, I had a different thought: Since I'm guessing eval STRING is not supported for policy reasons, and not for technical reasons, would it make sense to implement it differently in Guacamole? That is, allow eval("code") to be parsed, but then reject it after parsing as a matter of policy (similar to Perl::Critic's ProhibitStringyEval)? This would allow Guacamole's parser to be used for projects that do want to support eval, i.e. to adjust the policies.

And as another side note: I can also work around the missing support for eval EXPR with do FILE...

use standard;
use warnings;
use strict;

my $code = q{print "blam\n"};
  eval($code); # nope!
myeval($code); # yup!

use File::Temp qw/tempfile/;
sub myeval {
    my $c = shift;
    my ($fh, $fn) = tempfile('UNLINK'=>1);
    print {$fh} $c;
    close $fh;
    do $fn;
}
xsawyerx commented 3 years ago

This is a tough one. Yes, we can allow eval STRING to be parsed. And yes, we can then enforce it after parsing using a policy. And it is also true that we are making a judgment call here.

However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.

To view this from each component's perspective:

Now, if the assumption that all eval STRINGs can be easily replaced is no longer valid, we should revisit this issue. Can you suggest such a situation?

jeffreykegler commented 3 years ago

I avoid eval STRING in production code as much as my imagination allows, and here are the reasons I use it in Marpa::R2.

1.) Paranoia about $VERSION conversions:

use vars qw($VERSION $STRING_VERSION);
$VERSION        = '8.000000';
$STRING_VERSION = $VERSION;
## no critic(BuiltinFunctions::ProhibitStringyEval)
$VERSION = eval $VERSION;
## use critic

2.) Getting the result of a run-time "require":

if ( not eval "require $package" ) {

3.) A "developer-safe" coding hack:

$escape_by_ord[ ord eval qq{"$_"} ] = $_
    for "\\t", "\\r", "\\f", "\\b", "\\a", "\\e";

The forgoing is developer safe because each of the characters \t, etc. need only be specified once, making it unlikely that yours truly will specify them inconsistently. :-)

xsawyerx commented 3 years ago

I avoid eval STRING in production code as much as my imagination allows, and here are the reasons I use it in Marpa::R2.

1.) Paranoia about $VERSION conversions:

use vars qw($VERSION $STRING_VERSION);
$VERSION        = '8.000000';
$STRING_VERSION = $VERSION;
## no critic(BuiltinFunctions::ProhibitStringyEval)
$VERSION = eval $VERSION;
## use critic

Gaucamole has very strict version numbers, but I don't think it helps considering people use these versions in strings because then they need to be eval'ed.

The only way to deal with this, I guess, is requiring module versions not be set in strings and then it can be parsed by both Perl and Guacamole natively.

I think this does expose an interesting perspective of people using code (and values) which are not being natively parsed by Perl and run another Perl instead to then parse them. It's hard to see this as a good practice, though.

2.) Getting the result of a run-time "require":

if ( not eval "require $package" ) {

When I can't avoid runtime loading of a module name in a string, I use Module::Runtime. Maybe we shouldn't assume on people being able to use modules for runtime loading of variable module names?

3.) A "developer-safe" coding hack:

$escape_by_ord[ ord eval qq{"$_"} ] = $_
    for "\\t", "\\r", "\\f", "\\b", "\\a", "\\e";

The forgoing is developer safe because each of the characters \t, etc. need only be specified once, making it unlikely that yours truly will specify them inconsistently. :-)

I can understand this use, but it's not a big concern for me.

It's definitely a helpful trick. I hope the following is readable enough to avoid doing it with eval STRING:

$escape2[ ord $_->[0] ] = $_->[1]
    for
        [ "\t" => '\t' ],
        [ "\r" => '\r' ],
        [ "\f" => '\f' ],
        [ "\b" => '\b' ],
        [ "\a" => '\a' ],
        [ "\e" => '\e' ];

I'm still undecided if to allow eval STRING, to be honest.

haukex commented 3 years ago

However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.

Sure, that makes sense. I am happy this project exists, because I think a Perl that is parseable by more than just perl opens a lot of doors, so thank you for your efforts! And I don't want to seem like I'm complaining too much ;-)

I have a largeish (for one author) codebase that I'm considering porting to standard.pm, and one of the things I really miss in standard.pm is hash key autoquoting. I could of course fork Guacamole, but that's probably the least favorable option this early on, cooler would be if there were an extension mechanism built in (so I could make my own use standard::HAUKEX;), or, even cooler would be if Guacamole already supported these things but standard.pm disallowed it based on policy, and that policy is configurable to make people like me happy :-) That would of course require the line between Guacamole and standard.pm to be drawn more clearly. I am curious what solution you would lean towards in this situation? I may even have some time to volunteer to look into implementing it. (I'm currently guessing that an extension mechanism might be easiest in the short term. Edit: Actually, looks like that's #23?)

Standard Perl's syntax definition intends to be both clearer and simpler, which is why eval STRING is not allowed.

That would seem like an argument for disallowing s///ee as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing $_="";s// ... /ee; instead...

Other than supporting code which currently has it, do you have a case in which you need eval STRING.

I did a quick grep of my codebase and I use it a bit in testing, I have a few instances of eval "use $modulename; 1" and similar, and one interesting case of eval($Config{nv_overflows_integers_at}).

Also, I think templating engines make heavy use of eval STRING.

I personally don't need eval STRING, for those cases I do I could probably work around its absence, but it's more the thought that do, require, and s///ee are allowed but eval STRING is not that confused me. IMHO, either one disallows all kinds of dynamic code compliation (which would be helpful for security-senstive situations, though of course use would still need to be allowed, and that's really just a BEGIN { eval STRING }), or one allows them all.

xsawyerx commented 3 years ago

However, while we can argue that Guacamole (as a parser) can support more than Standard Perl (which can enforce using policies), I think the two are also intertwined in the sense that Guacamole is a parser for the Standard Perl syntax definition.

Sure, that makes sense. I am happy this project exists, because I think a Perl that is parseable by more than just perl opens a lot of doors, so thank you for your efforts! And I don't want to seem like I'm complaining too much ;-)

Don't dare worry that you're "complaining too much." :)

Your feedback is very valuable for making the project both better and more production-ready.

I have a largeish (for one author) codebase that I'm considering porting to standard.pm, and one of the things I really miss in standard.pm is hash key autoquoting.

That is one of the hard-stops in Guacamole because the quoting is a bit subtle and a thorny problem for beginners. Like many things, it's great until you hit the edge-cases:

$foo{bar} = "foo"; # auto-quoted
$foo{
bar
} = "foo";  # not auto-quoted

$foo{
caller} = "foo"; # also, not auto-quoted and runs caller()

$foo{+bar} = -foo; # left-side isn't auto-quoted, right-side is - what a mess

(For the record, I've hit those cases and it was not fun.)

I started writing a tool for adding quotes to code so I wouldn't have to do it myself. This is what I got so far.

I could of course fork Guacamole, but that's probably the least favorable option this early on, cooler would be if there were an extension mechanism built in (so I could make my own use standard::HAUKEX;), or, even cooler would be if Guacamole already supported these things but standard.pm disallowed it based on policy, and that policy is configurable to make people like me happy :-) That would of course require the line between Guacamole and standard.pm to be drawn more clearly. I am curious what solution you would lean towards in this situation? I may even have some time to volunteer to look into implementing it. (I'm currently guessing that an extension mechanism might be easiest in the short term. Edit: Actually, looks like that's #23?)

23 is indeed the idea of doing it. Unfortunately, I couldn't yet come up with the mechanism. If you have any thoughts or would like to help with this, I'd be more than happy to go into it together and see what's possible.

Standard Perl's syntax definition intends to be both clearer and simpler, which is why eval STRING is not allowed.

That would seem like an argument for disallowing s///ee as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing $_="";s// ... /ee; instead...

I don't fully understand this one, sorry.

Other than supporting code which currently has it, do you have a case in which you need eval STRING.

I did a quick grep of my codebase and I use it a bit in testing, I have a few instances of eval "use $modulename; 1" and similar, and one interesting case of eval($Config{nv_overflows_integers_at}).

Why would you have to eval that one? What is interesting about it? (Sorry, I'm not familiar with that key.)

Also, I think templating engines make heavy use of eval STRING.

I personally don't need eval STRING, for those cases I do I could probably work around its absence, but it's more the thought that do, require, and s///ee are allowed but eval STRING is not that confused me. IMHO, either one disallows all kinds of dynamic code compliation (which would be helpful for security-senstive situations, though of course use would still need to be allowed, and that's really just a BEGIN { eval STRING }), or one allows them all.

That's fair. What I'm worried about is adding to the syntax definition something that - from the start - has a label of "please don't even do this. You really, really, really shouldn't use this unless you're doing some very low-level stuff." But I guess it might be impossible to entirely avoid it in the parser.

jeffreykegler commented 3 years ago

In my list of uses of eval STRING in Marpa::R2, I omitted the uses in my test suite. Actually, testing is essential, and eval STRING does prove very handy in testing.

haukex commented 3 years ago

the quoting is a bit subtle and a thorny problem for beginners.

Yes, I get that (I even wrote about it myself on PerlMonks). To separate the two topics:

Hash subscripts: My understanding is that Standard Perl doesn't accept barewords anywhere (except the STD* filehandles), is that correct? In that case, wouldn't it be technically fairly easy to restrict hash keys to either an Expression, or to a VarIdent bareword with no whitespace between the braces? I totally understand helping newcomers, but as you can tell I'm also very interested in the technical aspects of this grammar, since it's a possibility to finally support things that couldn't be truly supported before: proper syntax highlighting, safe source filtering, safe refactoring, proper source code analysis, even source-to-source translation, and so on. Lowering the cost of porting existing codebases (and brains that aren't used to quoting everything) to Standard Perl would be a great advantage, IMHO, and help to newcomers could then be provided using policies similar to what Perl::Critic does.

Fat comma: This one is actually more important to me personally than the hash subscripts, since it's basically "the" way to write named parameters in Perl, and those names usually match the VarIdent rule (or they begin with a dash, hence #109). For example, IMHO the quotes in has "logger" => ( "is"=>'ro', "required"=>0, "isa"=>InstanceOf['MyLogger'], "lazy"=>1, "builder"=>sub {...} ) add a lot of noise. However, I assume it's technically more tricky to avoid the pitfalls here, since there are no braces to delimit the LHS of the fat comma.

Anyway, getting a little off topic :-)

23 is indeed the idea of doing it. Unfortunately, I couldn't yet come up with the mechanism. If you have any thoughts or would like to help with this, I'd be more than happy to go into it together and see what's possible.

Sure! Currently, I'm approaching this from the viewpoint of my use case, i.e. how I could port my codebase to a Perl that is statically parseable, while still allowing me to keep some of my personal style (like e.g. hash key autoquoting, and I would love to get certain sub calls without parens back, if I can think of a safe way to implement that). I will think about this some more, and I'll look at the suggested extension mechanism, but since I'm currently working on a bunch of different things, I'm currently operating on somewhat longer time scales (as evidenced by my delayed reply) - I'll of course report back once I have something interesting.

Why would you have to eval that one? What is interesting about it? (Sorry, I'm not familiar with that key.)

Its value looks like "256.0*256.0*256.0*256.0*256.0*256.0*2.0*2.0*2.0*2.0*2.0", so it needs the eval to get the value (9007199254740992). As per this: All integer values from 0 to this number can be stored without loss as an NV; some larger integers can be stored without loss in an NV, but not the one 1 higher than this.

That would seem like an argument for disallowing s///ee as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing $_="";s// ... /ee; instead...

I don't fully understand this one, sorry.

Yes, sorry, the second sentence is just a scenario in my imagination :-) The main question is: You said "Standard Perl's syntax definition intends to be both clearer and simpler", then why is s///ee still allowed when eval STRING isn't? eval("perl code") looks like a simple function call while s///ees are in my experience hard for many coders to even wrap their head around.

So I'm basically just saying: IMHO, either disallow s///ee, do FILE, and eval STRING, or allow all of them. (I omitted require FILENAME from that list since that's probably still used too often, but if one were being really strict about this, then require should be disallowed entirely too, leaving only use.)

xsawyerx commented 2 years ago

I've just pushed support for eval STRING as well as eval $var. (Actually, it's Eval Value, so a few different stuff is supported.)

xsawyerx commented 2 years ago

the quoting is a bit subtle and a thorny problem for beginners.

Yes, I get that (I even wrote about it myself on PerlMonks). To separate the two topics:

Hash subscripts: My understanding is that Standard Perl doesn't accept barewords anywhere (except the STD* filehandles), is that correct?

This is correct.

In that case, wouldn't it be technically fairly easy to restrict hash keys to either an Expression, or to a VarIdent bareword with no whitespace between the braces?

I think that might be possible (currently, we're managing a lot of whitespace rules through Marpa, but there's a ticket for manually handling all whitespace. I'm not sure it's worth the effort and initial picking at it failed miserably.

I think we're now entering user behavior: If $foo{bar} works, why shouldn't $foo{ bar }? Perl certainly sees it the same way. I'm cautious about creating rules that say "it's like this, except for this, unless that." There are also issues with unary operations that cause a lack of quoting or remain part of quoted strings: If $foo{bar} is autoquoted, why not $foo{-bar}? If this is allowed, why not $foo{+bar}? And we just entered breakage because + is a unary op.

By having strings always be strings, the entire issue is avoided. If you come from another language, strings being quoted is not an unreasonable thing.

I totally understand helping newcomers, but as you can tell I'm also very interested in the technical aspects of this grammar, since it's a possibility to finally support things that couldn't be truly supported before: proper syntax highlighting, safe source filtering, safe refactoring, proper source code analysis, even source-to-source translation, and so on. Lowering the cost of porting existing codebases (and brains that aren't used to quoting everything) to Standard Perl would be a great advantage, IMHO, and help to newcomers could then be provided using policies similar to what Perl::Critic does.

I think you hit an edge case where we could probably (phrasing it cautiously here) but we would create a situation that I think adds a burden of difference between Guacamole and Perl (and other parts of Guacamole) that might cause more confusion than relief.

Fat comma: This one is actually more important to me personally than the hash subscripts, since it's basically "the" way to write named parameters in Perl, and those names usually match the VarIdent rule (or they begin with a dash, hence #109). For example, IMHO the quotes in has "logger" => ( "is"=>'ro', "required"=>0, "isa"=>InstanceOf['MyLogger'], "lazy"=>1, "builder"=>sub {...} ) add a lot of noise. However, I assume it's technically more tricky to avoid the pitfalls here, since there are no braces to delimit the LHS of the fat comma.

LHS of fat comma is not a fun thing to tackle and includes edge-cases just the same.

Anyway, getting a little off topic :-)

23 is indeed the idea of doing it. Unfortunately, I couldn't yet come up with the mechanism. If you have any thoughts or would like to help with this, I'd be more than happy to go into it together and see what's possible.

I just pushed a commit that adds methods for extending the grammar, specifically with raw lexemes and with keywords. Here is a sample addition of Try::Tiny-like try {} catch {} syntax:

Guacamole->add_keyword(
    'Catch',
    'unary',
    'catch',
    [ 'OpKeywordCatch BlockNonEmpty' ]
);

Guacamole->add_keyword(
    'Try',
    'list',
    'try',
    [
        'OpKeywordTry BlockNonEmpty OpKeywordCatchExpr',
        'OpKeywordTry BlockNonEmpty',
    ],
);

Sure! Currently, I'm approaching this from the viewpoint of my use case, i.e. how I could port my codebase to a Perl that is statically parseable, while still allowing me to keep some of my personal style (like e.g. hash key autoquoting, and I would love to get certain sub calls without parens back, if I can think of a safe way to implement that). I will think about this some more, and I'll look at the suggested extension mechanism, but since I'm currently working on a bunch of different things, I'm currently operating on somewhat longer time scales (as evidenced by my delayed reply) - I'll of course report back once I have something interesting.

I'm willing to be more flexible to help Guacamole be supported. I've updated it to support eval Value but I don't think I can see my way through autoquoting. Can't say I never will, but I haven't yet been convinced of it.

Why would you have to eval that one? What is interesting about it? (Sorry, I'm not familiar with that key.)

Its value looks like "256.0*256.0*256.0*256.0*256.0*256.0*2.0*2.0*2.0*2.0*2.0", so it needs the eval to get the value (9007199254740992). As per this: All integer values from 0 to this number can be stored without loss as an NV; some larger integers can be stored without loss in an NV, but not the one 1 higher than this.

This should now be supported with eval Value.

That would seem like an argument for disallowing s///ee as well? Otherwise, I imagine an example where someone is "forced" to use standard.pm by company coding standards, but they want to use it, they'll just start writing $_="";s// ... /ee; instead...

I don't fully understand this one, sorry.

Yes, sorry, the second sentence is just a scenario in my imagination :-) The main question is: You said "Standard Perl's syntax definition intends to be both clearer and simpler", then why is s///ee still allowed when eval STRING isn't? eval("perl code") looks like a simple function call while s///ees are in my experience hard for many coders to even wrap their head around.

So I'm basically just saying: IMHO, either disallow s///ee, do FILE, and eval STRING, or allow all of them. (I omitted require FILENAME from that list since that's probably still used too often, but if one were being really strict about this, then require should be disallowed entirely too, leaving only use.)

You make a good argument. I hope the eval Value support will serve you well. :)