viraptor / phply

PHP parser written in Python using PLY
Other
366 stars 100 forks source link

fix a PHP version #48

Open DylannCordel opened 6 years ago

DylannCordel commented 6 years ago

Hi,

I saw here a discussion start about PHP version. Currently, It seems that phply is a mix between php 5.x and php 7.x.

For exemple this code is parsed like PHP 5.6.x:

yield $foo or die;

Its returns Yield(BinaryOp(u'or', Variable(u'$foo'), Exit(None, u'die')))
instead of BinaryOp(u'or', Yield(Variable(u'$foo')), Exit(None, u'die'))

see yield is now a right associative operator

  1. Maybe phply could have a param to specify which version of PHP we want to parse ?
  2. Or maybe you could have a branch for each "X.Y" version of PHP ?

What do you think about it ? IMHO, solution 1 will be more useful: for ex, a script could try to parse a file with a specific version and fallback with another if there is a SyntaxError Exception raised by phply.

Once there will be a fixed php version for a context, we could fix this type of errors.

viraptor commented 6 years ago

Hmm... I'm more inclined to supporting the latest php version (at the time of phply release). Supporting multiple versions is possible, but a bit of work, unless it's done globally. And I don't think a straightforward global switch is that good really:

NOW_HANDLE_PHP=5
import phply

Having a real, per-version parameter requires passing it to every lexer/parser fragment which then needs to generate the parsertab specific to that version. I'm not sure if ply even supports that use case, since the parser signatures are already defined by ply to be def p_state(p):

Two simpler approaches I can think of are:

  1. Temporarily setting the parser version in thread-local storage - just for the time to build the specific parser.
  2. Having one "common" directory and a parser directory per-version. They would need to import everything from "common" and override/provide the functions they need.

Both would require overriding the parsertab name in some way, so that it's version-specific. I'm more inclined to approach 2, since it doesn't do global state. I'd like to see this, but I can't really make a commitment to implementing this soon.

DylannCordel commented 6 years ago

Second approach would be satisfying. I dunno when/if I'll get time to work on this but I'll :up: here if I can start something. Thanks for your advices.

viraptor commented 6 years ago

@DylannCordel It looks like ply already implements switching the parser module, which makes this a lot easier. Let me know if you see anything bad with this approach. If not, I'll add some more v5/v7 details to the version-specific parsers.

DylannCordel commented 6 years ago

@viraptor With the module parameter of yacc ? I just took a quick look and it seems a good way to support multiple php version. A solution would be to have those files :

In php_54.py : defines parsing functions for php54

In php_72.py for exemple:

# include previous version rules
from .php_71 import *

# if this version made a step backward for something 
# eg: like array order changed in PHP 7.0 and re-changed in 7.2 to keep the 5.6 logic
# see http://php.net/manual/en/migration71.incompatible.php#120221
from .php_56 import p_something

# if we need a new parsing rule
def p_foo(p):
    pass

# if we need to completely change an existing parsing rule, just overwrite it:
def p_inner_statement_yield(p):
    # new logic

# if we need to "extends" an existing parsing function:
_p_error = p_error
def p_error(p):
    # some pre-extra-logic
    _p_error(p)  # call of the base function
    # some post-extra-logic

And finally, in phpparse.py (where there is no more p_* functions), make_parser could be updated to something like :

# Build the grammar
def make_parser(debug=False, version='7.2'):
    module_path = 'phply.parsing_rules.php_%s' % version.replace('.', '')
    try:
        parsing_rules = import_module(module_path)
    except ImportError:
        raise Exception('Version %s of PHP is not supported by phply')
    else:
        return yacc.yacc(debug=debug, module=parsing_rules)

Do you think it's possible and a good approach ?

viraptor commented 6 years ago

Something like that. Still wondering if it's better to have the parsers in separate modules or separate classes. Both have pros and cons. I'll have some more fun with this on the weekend.

DylannCordel commented 6 years ago

You can also have classes to have their advantages but use these as module if you want (python is so cool :yum:):

# php_72.py

from .php_71 import PHP71
from .php_70 import PHP70  #  see comment about the backward step

class PHP72(PHP71):

    # if this version made a step backward for something 
    # eg: like array order changed in PHP 7.0 and re-changed in 7.2 to keep the 5.6 logic
    # see http://php.net/manual/en/migration71.incompatible.php#120221
    def p_something(self, p):
        # current MRO : PHP72 => PHP71 => PHP70 => PHP56 => PHP55 => PHP54 => object
        # we call the first parent class wich comes after PHP70 in the MRO tree, so PHP56 is used :
        return super(PHP70, self).p_something(p)

    # if we need a new parsing rule
    def p_foo(self, p):
        pass

    # if we need to completely change an existing parsing rule, just overwrite it:
    def p_inner_statement_yield(self, p):
        # new logic

    # if we need to extends an existing parsing function:
    def p_error(self, p):
        # some pre-extra-logic
        super(PHP72, self).p_error(p)
        # some post-extra-logic

php_72 = PHP72()
php_72__name__ = __name__
sys.modules[__name__] = php72_rules
# Ugly? Guido recommends this himself ...
# http://mail.python.org/pipermail/python-ideas/2012-May/014969.html