Open guevara opened 1 week ago
How I Wrote PHP Skeleton For Bison https://ift.tt/QzPmIKS Anton Sukhachev
devm.io/php/php-skeleton-bison-generics
Do you dream of generics in PHP?
I wanted it so much - I made a library that brings generics in PHP.
<?php namespace App; class Box<T> { private ?T $data = null; public function set(T $data): void { $this->data = $data; } public function get(): ?T { return $this->data; } }
If you are interested you can test it. Only native PHP is required (without extensions).
But in this article, I want to tell you about a very important part of my library - AST parser.
I use a very popular library nikic/php-parser. Many other software uses it.
It helps you to build AST from source code like this:
<?php namespace App; class Test { public function test($foo) {} }
. ├── ZEND_AST_STMT_LIST ├── ZEND_AST_NAMESPACE │ └── ZEND_AST_ZVAL 'App' └── ZEND_AST_CLASS 'Test' └── ZEND_AST_STMT_LIST └── ZEND_AST_METHOD 'test' └── ZEND_AST_PARAM_LIST └── ZEND_AST_PARAM └── ZEND_AST_ZVAL 'foo'
Every AST parser has a lexical analyzer, syntax analyzer, and AST builder. Usually, it grouped into Lexer and Parser.
You don't need to write Lexer and Parser from scratch.
To build Lexer you can use tools:
How do Lexers work?
Lexers help you to parse text into tokens.
For example PHP engine's Lexer use re2c.
php-src Lexer example
Below you can see PHP code and tokens from Lexer.
<?php | T_OPEN_TAG | T_WHITESPACE $a = 1; | T_VARIABLE T_WHITESPACE = T_WHITESPACE T_LNUMBER ; | T_WHITESPACE echo $a; | T_ECHO T_WHITESPACE T_VARIABLE ;
We can think about PHP engine and php-parser Lexers as similar Lexers because function get_token_all() calls re2c functions under the hood.
get_token_all()
re2c
After the Lexer we have tokens, and we need a Parser to build AST.
To build Parser you can use the tools:
How do parser generators work?
A generator takes your grammar.y BNF file, parses it, extracts all definitions, and then constructs a bunch of tables like this:
grammar.y
$yytable = [ 6, 3, 7, 20, 8, 51, 28, 1, 52, 4, 9, 13, 10, 29, 15, 30, 18, 31, 16, 19, 32, 22, 33, 34, 23, 24, 35, 11, 37, 25, 21, 38, 39, 26, 45, 0, 40, 42, 0, 43, 41, 0, 0, 49, 0, 0, 0, 0, 0, 47, 48, 0, 50, 0, 53, 54 ];
Then, this data is passed to a template that is called a Skeleton.
Skeleton
For Bison, Skeleton is a special file written in M4 language that renders your parser file.
parser
By default, Bison Skeletons supports C/C++/D/Java languages.
PHP engine and php-parser use different parser generators but use very similar grammar files.
php-src grammar example
statement: | T_BREAK optional_expr ';' { $$ = zend_ast_create(ZEND_AST_BREAK, $2); } | T_CONTINUE optional_expr ';' { $$ = zend_ast_create(ZEND_AST_CONTINUE, $2); } | T_RETURN optional_expr ';' { $$ = zend_ast_create(ZEND_AST_RETURN, $2); }
php-parser grammar example
non_empty_statement: | T_BREAK optional_expr semi { $$ = Stmt\Break_[$2]; } | T_CONTINUE optional_expr semi { $$ = Stmt\Continue_[$2]; } | T_RETURN optional_expr semi { $$ = Stmt\Return_[$2]; }
After all this information about parsers, we can summarize it on the scheme:
I had thought about replacing KmYacc with Bison in php-parser.
KmYacc
Bison
It is great for PHP engine and php-parser to use the same tools to make the same job.
Even the fact, that Bison doesn't have PHP Skeleton didn't stop me.
I decided to create my own skeleton.
I translated Java skeleton to PHP. It took a few months for me.
Translating Java code to PHP is not very hard, but if your code is not written with m4 and has not very many options.
Java-skeleton example
]b4_yystype[ lval = yylexer.getLVal();]b4_locations_if([[ ]b4_location_type[ yyloc = new ]b4_location_type[(yylexer.getStartPos(), yylexer.getEndPos()); status = push_parse(token, lval, yyloc);]], [[ status = push_parse(token, lval);]])[
PHP-skeleton example
/** @@var ]b4_yystype[ */ $lval = $this->yylexer->getLVal();]b4_locations_if([[ /** @@var ]b4_location_type[ */ $yyloc = new ]b4_location_type[($this->yylexer->getStartPos(), $this->yylexer->getEndPos()); $status = $this->push_parse($token, $lval, $yyloc);]], [[ $status = $this->push_parse($token, $lval);]])[
After a few months and many auto tests php-skeleton was ready!
[php-bison-skeleton] composer test > php vendor/bin/phpunit PHPUnit 9.6.5 by Sebastian Bergmann and contributors. ................................................................. 65 / 72 ( 90%) ....... 72 / 72 (100%) Time: 00:04.037, Memory: 6.00 MB OK (72 tests, 384 assertions)
Then I tried to replace KmYacc with Bison.
You can reproduce the replacement with the steps:
install required libraries:
composer require --dev mrsuh/php-bison-skeleton composer require nikic/php-parser
generate grammar file of php-parser:
cd vendor/nikic/php-parser/ composer install php grammar/rebuildParsers.php --keep-tmp-grammar cp grammar/tmp_parser.phpy ../../../../../examples/php/nikic-grammar.y
replace the dollar sign before Bison generate Parser and replace it back after because Bison doesn't support dollar sign in the grammar:
php bin/replace-dollar-sign.php in nikic-grammar.y nikic-grammar-replaced.y bison -S ../../src/php-skel.m4 -o lib/parser-tmp.php nikic-grammar-replaced.y php bin/replace-dollar-sign.php out lib/parser-tmp.php lib/parser.php
Great! The parser is ready.
Time to compare PHP parser generated with Bison and KmYacc.
I had run tests with 3 different files sizes and different PHP versions (smaller is better):
As you can see performance of the parser generated with Bison is slower than the parser generated with KmYacc.
I tried to optimize generated parser code, but it gave maximum ~15 percent improvement. Not such much.
In the end, I replaced KmYacc with Bison in php-parser, but it works not such well as I imagined.
Now I have a well-working php-skeleton for Bison.
Maybe next time I'll try to replace KmYacc with ANTLR.
ANTLR
You can found php-bison-skeleton, many examples and tests into this repository
Thank you for your time. Hope you find this article useful.
How I Wrote PHP Skeleton For Bison
https://ift.tt/QzPmIKS
Anton Sukhachev
devm.io/php/php-skeleton-bison-generics
Do you dream of generics in PHP?
I wanted it so much - I made a library that brings generics in PHP.
If you are interested you can test it. Only native PHP is required (without extensions).
But in this article, I want to tell you about a very important part of my library - AST parser.
I use a very popular library nikic/php-parser. Many other software uses it.
It helps you to build AST from source code like this:
Every AST parser has a lexical analyzer, syntax analyzer, and AST builder. Usually, it grouped into Lexer and Parser.
You don't need to write Lexer and Parser from scratch.
To build Lexer you can use tools:
How do Lexers work?
Lexers help you to parse text into tokens.
For example PHP engine's Lexer use re2c.
php-src Lexer example
Below you can see PHP code and tokens from Lexer.
We can think about PHP engine and php-parser Lexers as similar Lexers because function
get_token_all()
callsre2c
functions under the hood.After the Lexer we have tokens, and we need a Parser to build AST.
To build Parser you can use the tools:
How do parser generators work?
A generator takes your
grammar.y
BNF file, parses it, extracts all definitions, and then constructs a bunch of tables like this:Then, this data is passed to a template that is called a
Skeleton
.For Bison,
Skeleton
is a special file written in M4 language that renders yourparser
file.By default, Bison Skeletons supports C/C++/D/Java languages.
PHP engine and php-parser use different parser generators but use very similar grammar files.
php-src grammar example
php-parser grammar example
After all this information about parsers, we can summarize it on the scheme:
I had thought about replacing
KmYacc
withBison
in php-parser.It is great for PHP engine and php-parser to use the same tools to make the same job.
Even the fact, that Bison doesn't have PHP Skeleton didn't stop me.
I decided to create my own skeleton.
I translated Java skeleton to PHP. It took a few months for me.
Translating Java code to PHP is not very hard, but if your code is not written with m4 and has not very many options.
Java-skeleton example
PHP-skeleton example
After a few months and many auto tests php-skeleton was ready!
Then I tried to replace
KmYacc
withBison
.You can reproduce the replacement with the steps:
install required libraries:
generate grammar file of php-parser:
replace the dollar sign before Bison generate Parser and replace it back after because Bison doesn't support dollar sign in the grammar:
Great! The parser is ready.
Time to compare PHP parser generated with
Bison
andKmYacc
.I had run tests with 3 different files sizes and different PHP versions (smaller is better):
As you can see performance of the parser generated with
Bison
is slower than the parser generated withKmYacc
.I tried to optimize generated parser code, but it gave maximum ~15 percent improvement. Not such much.
In the end, I replaced
KmYacc
withBison
in php-parser, but it works not such well as I imagined.Now I have a well-working php-skeleton for
Bison
.Maybe next time I'll try to replace
KmYacc
withANTLR
.You can found php-bison-skeleton, many examples and tests into this repository
Thank you for your time. Hope you find this article useful.
via mrsuh.com https://mrsuh.com
November 15, 2024 at 06:38PM