Open theofidry opened 8 years ago
@Hywan as the maintainer of the HoaCompiler, WDYT of this choice here, is it something the compiler would be good at?
Hello and thanks for considering Hoa\Compiler
😃!
Analysing a language and compiling it into something else is the essence of Hoa\Compiler
, so yes. You can write your own grammar thanks to the PP grammar description language (see an example from the README.md
).
The grammar might be minimalist if I am correctly reading your examples. The visitor will be simple too. A simple example you might want to look at is the Hoa\Ruler
library. The grammar is very minimalist and you have several compilers (called visitors here because they visit the produced AST), like interpreter to compile from text to in-memory object model, or compiler to compile from in-memory object model to PHP code.
So this is a big yes 😉.
That said, Hoa\Compiler
provides mechanisms you will love.
A grammar is used to represent any kind of data. Thus, we can use it to validate a data (which is the classical usage), or to… generate a data. This was a big part of my PhD thesis about Praspel. Long story short, with a grammar expressed with PP and 1 algorithm within 3, you can generate data that match the grammar. I am copy-pasting the example from the README.md
:
$sampler = new Hoa\Compiler\Llk\Sampler\Coverage(
// Grammar.
Hoa\Compiler\Llk\Llk::load(new Hoa\File\Read('Json.pp')),
// Token sampler.
new Hoa\Regex\Visitor\Isotropic(new Hoa\Math\Sampler\Random())
);
foreach ($sampler as $i => $data) {
echo $i, ' => ', $data, "\n";
}
/**
* Will output:
* 0 => true
* 1 => {" )o?bz " : null , " %3W) " : [false, 130 , " 6" ] }
* 2 => [{" ny " : true } ]
* 3 => {" Ne;[3 " :[ true , true ] , " th: " : true," C[8} " : true }
*/
This approach and these algorithms are used to do what we call: Grammar-based Testing. See the research paper here:
Several people (like @jubianchi or @vonglasow) are using this approach to generate test data or to populate a database. They write a grammar, they generate data based on this grammar and boom. The most common example I hear is: Describing a JSON payload with the grammar and generate data from it.
There are 3Â algorithms. They are described in the hack book of Hoa\Compiler
.
Considering the goal of your project, these algorithms can be very very… very useful for you.
There is one more thing… To be able to generate data from a grammar, we need to be able to generate data for the tokens. Token values are represented by PCRE. So… you guessed it, we are able to generate data based on a regular expression. See the Hoa\Regex
library, it shows one example. I am copy-pasting the most interesting part here:
// 1. Read the grammar.
$grammar = new Hoa\File\Read('hoa://Library/Regex/Grammar.pp');
// 2. Load the compiler.
$compiler = Hoa\Compiler\Llk\Llk::load($grammar);
// 3. Lex, parse and produce the AST.
$ast = $compiler->parse('ab(c|d){2,4}e?');
// 4. Set up the sampler.
$generator = new Hoa\Regex\Visitor\Isotropic(new Hoa\Math\Sampler\Random());
// 5. To infinity and beyond!
echo $generator->visit($ast);
/**
* Could output:
* abdcde
*/
I don't mean to make some advertisements here, but I really think it can provide really cool features.
Thanks for the detailed answer @Hywan :)
I hope I'll have time to look into this soon. To be completely transparent this part is not exactly my priority now as I have still quite a lot to do for alice, AliceDataFixtures and HautelookAliceBundle. The priority being stabilising the three libraries and easing the migration.
I would love however to have the time and energy to look into it before the stable release, it will avoid to go stable with the whole Expression Language marked as internal. That said, maybe someone else is ready to tackle this RFC :P
We can if needed. If you play the role of the PO, draft all the issues etc., I am sure we could find time to help :-).
Hehe I need to update the doc, but otherwise I think for a developer, the best doc is ParserIntegrationTest. Anything internally on how to generate this result is internal and can be completely changed.
There is definitely a scenario or two missing, I tried to be as exhaustive as possible but well I'm not a machine and the sheer number of combinations not coverable either, but it gives a good base I would say.
@theofidry Where is the grammar defined?
That's the thing there is no proper grammar system. Basically there's a lexer (which has its own share of tests) which transforms expressions into Tokens like:
yield '[Escaped arrow] surrounded' => [
'foo \< bar \> baz', // input
[ // expected
new Token('foo ', new TokenType(TokenType::STRING_TYPE)),
new Token('\<', new TokenType(TokenType::ESCAPED_VALUE_TYPE)),
new Token(' bar ', new TokenType(TokenType::STRING_TYPE)),
new Token('\>', new TokenType(TokenType::ESCAPED_VALUE_TYPE)),
new Token(' baz', new TokenType(TokenType::STRING_TYPE)),
],
];
And then the parser will, depending of the type of the token, parse the value accordingly depending of the token type.
So as of now, it's pretty manual hence the desire to change to something more standard :)
I see. I guess the users have a documentation with all the possible syntax?
Yep, #377 which may be slightly outdated right now and ParserIntegrationTest. Tests being a big part of the doc here for the better or the worst :/
Great! I don't have time right now but I will try to find some. Maybe some Hoackers could help me. What's your schedule?
I hope to have finished most of it by the end of the month. Then it will be a few updates or bugfixes here and there and let it live for 2-3 months before a stable release.
@Hywan I took a glance this weekend for the Compiler, looks like a good solution to replace the in-house lexer. I still have a few issues with your PP language but I think it's just a matter of getting familiar with it. I'm not sure if I should do it after or before the stable release yet. A little question though: why are hoa projects not semver?
@theofidry Funny, I opened your issue this weekend too 😛. I can help to write the grammar (in PP) if you need help.
Hoa libraries are compatible with semver, but here is the answer: https://hoa-project.net/En/Source.html#Rush_Release.
Cool :) I'll push a POC soon to be able to discuss on it then :)
Perfect! Please, ping me.
@theofidry perhaps there are other out-of-the-box workarounds.
Rather than coming up with a special language (which users would have to spend time learning), what if the project adopts the Expression Language?
# before
Is\Bundle\PlanBundle\Entity\Event:
event_bare (template):
title: <sentence(3)>
show: '@show_*'
rooms[0]: '@room_*'
startDateTime: '<dateTimeBetween("-1 month", "+4 month")>'
endDateTime: '<dateTimeInInterval($startDateTime, "+4 hours")>'
isDraft: false
version: '10%? @version_*'
tags 25%?: ['<randomElement(@tag_{0..3})>']
__calls:
- setRevenue (25%?): ['<moneyBetween(10000, 300000)>']
- setVisitorCount (25%?): ['<numberBetween(100, 500)>']
# after
Is\Bundle\PlanBundle\Entity\Event:
event_bare (template):
title: faker.sentence(3)
show: alice.one('show_*')
rooms: faker.randomElements(alice.some('room_*'), faker.randomNumber(1, 2))
startDateTime: faker.dateTimeBetween('-1 month', '+4 months')
endDateTime: faker.dateTimeInInterval(this.startDateTime, '+4 hours')
isDraft: false
version: alice.sometimes(0.1, alice.one('version_*'))
tags: alice.sometimes(0.25, faker.randomElements(alice.some('tag_*')))
revenue: alice.sometimes(0.25, myown.moneyBetween(10000, 300000)
visitorCount: alice.sometimes(0.25, faker.numberBetwen(100, 500))
In a nutshell I'd propose these changes. These are just some thoughts that came up as I was thinking about this.
alice.
and Faker-related functions under faker.
. Currently it's very difficult to know which documentation to consult.this
point to the current fixture.alice.one('potato_*')
or alice.one('potato_{0..3}')
. If more than 1 fixture is matched, pick one randomly.
4.2 To reference multiple fixtures, use alice.some('potato_*')
. That would return all fixtures that match the pattern.custom.
or just globally.foo 25%?: potato
would become foo: alice.somtimes(0.25, potato)
or something similar.What are your thoughts? Surely, this is a breaking change, but I think that this change would let maintainers focus more of their time on features rather than having to wrestle with the idiosyncrasies of the syntax.
Hi @kgilden.
It's an interesting proposal indeed. A couple of notes however:
rooms[0]: '@room_*'
I have no idea if this is currently supported to be honestfrom:
version: '10%? @version_*'
tags 25%?: ['<randomElement(@tag_{0..3})>']
to:
version: alice.sometimes(0.1, alice.one('version_*'))
tags: alice.sometimes(0.25, faker.randomElements(alice.some('tag_*')))
I am not sure this is equivalent. Indeed for version
the next syntax is correct, but not for tags
since in 25%
cases, tags
won't be called at all, not receive null
.
The same for __calls
which is a well separated step for which the result may be re-used and unlike for hydration, the calls here points at methods whereas during the hydration, the hydrator may use the property directly (even if private depending of your config).
I however don't think it invalidates your suggestion.
I like the idea, but I'm mitigated since:
Thanks @theofidry,
Cool that you're considering this. And apologies for not being quite rigorous in my proposal. I suppose the gist of my proposal is to replace the current syntax out with expression language.
It is a big BC break since it completely changes the syntax for the users which remained unchanged since 1.x.
Agreed that this would be a big BC break and I hate them as much as any other dev. Maybe it would be possible to keep BC by introducing syntax versions (user specifies on top of the file which version of the syntax they prefer to use, i.e. version: 1
).
If we break it that much, I'm wondering if we are not better off going a step further and going for PHP based templates instead of YAML as it would remove any need for expression languages.
Could you show an example of what you have in mind? In my opinion one of the nice things about this library is that fixture generation is terse. Sure, I could use plain Doctrine Fixtures, but the end result tends to be complex and difficult to update. So if PHP templates keeps to the same terseness, I'd be :+1: with that. Anything goes for me that would allow me to sometimes use function nesting without any surprises (such as https://github.com/nelmio/alice/issues/842).
I'd be interested in what other users of this library think of this as well.
Sure, sorry I didn't do that yesterday, I had to give it a bit more thoughts & time:
<?php
use Is\Bundle\PlanBundle\Entity\Event;
use Nelmio\Alice\Alice;
return [
Event::class => [
'foo1' => [
'title' => Alice::faker()->sentence(3),
'show' => Alice::reference('@show_*'),
'startDateTime': Alice::faker()->dateTimeBetween('-1 month', '+4 month'),
'isDraft' => false,
'version' => Alice::optional(10, Alice::reference('@version_*')),
'__calls': [
'setRevenue (25%?)': [Alice::faker()->moneyBetween(10000, 300000)]
],
],
'foo2' => new Foo(),
],
];
There is three immediate advantages there:
Alice
classfoo2
in the example above)Honestly I'd be cool with both directions. As long as it would be possible to add custom extensions that in turn are dependent on other services (i.e. Symfony DI).
However, I'm a bit worried that perhaps it becomes more verbose and gives too much "power". I like the fact that the current YAML syntax limits developers from writing long complex code and keeps the focus more on relationships between fixtures.
As long as it would be possible to add custom extensions that in turn are dependent on other services (i.e. Symfony DI)
I don't think this would be too difficult and I agree it's a requirement: HautelookAliceBundle depends on it as well.
However, I'm a bit worried that perhaps it becomes more verbose and gives too much "power"
That's a risk, but I think it's ok. Right now the vast majority of the issues are about a lexing/parsing problem which can only be solved by this PR and even so, people feel overburdened from this YAML syntax and trying to learn alice DSL.
Also for the record, in 1.x & 2.x it was also possible to a certain extend (just not as discoverable).
Please excuse the the perhaps not so relevant comment, but does this mean nested functions like <some_custom_provider(<numberBetween(1,5)>)>
can't actually be parsed?
I've tried all sorts of different syntaxes and it either evaluates numberBetween(1,5)
as a string, or fails with the following error:
In ExpressionLanguageExceptionFactory.php line 59:
The value "<numberBetween(1" contains an unclosed function.
All I found was this related issue hautelook/AliceBundle#327.
They can and they are to a certain extend. It however relies on regexes which is extremely flimsy
Is there a workaround? Anything with more than one argument seems to break with the same error. I tried escaping the coma and what not. Even the example given in the the docs breaks:
App\Entity\Dummy:
dummy:
functionValue: '<strtolower("BAR")>'
nestedFunctionValue: '<strtolower(<(implode(" ", ["HELLO", "WORLD", \<foo()>]))>)> \<bar()>'
In TolerantFixtureDenormalizer.php line 68: An error occurred while denormalizing the fixture "dummy" (App\Entity\Dummy): The value "<(implode(__ARG_TOKEN__7215ee9c7d9dc229d2921a40e899ec5f" contains an unclose d function.
Escaped expressions also fail:
'<strtolower(<(implode(["HELLO"]))>)> \<bar()>'
In TolerantFixtureDenormalizer.php line 68: An error occurred while denormalizing the fixture "dummy" (App\Entity\Dummy): Invalid token "\" found.
I'm using Alice 3.5.7, by the way.
The easiest workaround is:
Ok, so these really don't work. Thank you for the clarification. I wanted to upgrade from Alice 2.3 and AliceBundle 1.4, and I have a lot of fixtures to change. I was thinking of cramming everything into the providers, but I'll just stick to the old versions for now.
If your fixtures works it's fine, just be aware that only barely 10% of alice 2.x is actually tested so it's based a lot on luck... I think however https://github.com/nelmio/alice/issues/998 is the real solution that will make everyone happy tbh
As mentioned in #600, the current lexer/parser of the Expression Language is completely custom. While it does the job, I can't say I'm very proud of the implementation and it's far from my field of expertise. Relying on a third-party library for that task would make sense, maybe:
And eventually other (I didn't take the time to properly look into it).
The goal of this component is to be able to transform values as described in #377. Maybe a more detailed example is the actual integration test of the Expression Language parser: ParserIntegrationTest