Closed mathiasverraes closed 3 years ago
See the commit above, that shows how to parse the path. But I'm sure you figured that part out yourself.
I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?
Thank you for your responsiveness!
To provide you with more context. I'm building an open source utility to normalize a flat map using a small DSL. I'm currently using in production a version of this utility I wrote using imperative code. I thought it would be nice to build an AST Tree using the Parsica library that I use to normalize the data.
I already managed to build a single branch of the AST tree by parsing a single key ("$.a.b.c.d").
[
{
"id@int": "123",
"name": "Brian",
"car.color":"red",
"car.type": "SUV",
"children[childId].id": "1",
"children[childId].name": "Tom",
"_childId": "1",
},
{
"id@int": "123",
"name": "Brian",
"car.color":"red",
"car.type": "SUV",
"children[childId].id": "2",
"children[childId].name": "Harry",
"_childId": "2",
}
]
=>
{
"id": 123,
"name": "Brian",
"car": {"color":"red", "type": "SUV"},
"children" [{"id": "1", "name": "Tom"}, {"id": "2", "name": "Harry"}]
}
I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?
That's correct, I would like to "skip" the parsing of repeated strings
[
"/a/b/c/file1",
" **/a/b/c/** file2",
" **/a/b/c/** file3",
" **/a/b/** file4"
]
I would like to introduce a ParsingContext to my parser factory to provide me with cachedParsingSteps:
function parserBoundToContext(ParsingContext $context) {
$resolveFromCache = fn($input) $this->resolveFromCache($input, $context);
$parser = (
// resolved from $context->cachedParsingSteps
string("/a/b/c/")->map($resolveFromCache)
->or(string("/a/b/")->map($resolveFromCache))
->or(string("/a/")->map($resolveFromCache))
->or(nothing())
);
// ...
}
Whenever a folder is parsed, I would like to enrich the cachedParsingSteps with the input (the whole input since the start -- could be generated from the whole input
minus the remainder
) and its result.
Ideas
I hope all this makes sense to you.
Cheers,
Some thoughts:
<?php
function yourCombinator(Parser $parser, $context) : Parser {
return Parser::make(function (string $input) use ($parser, $context) : ParseResult {
$result = $parser->run($input);
if($result->isSuccess()) {
// You now have access to $result->output() and ->remainder(),
// as well as the original $input, the original $parser, and your $context
}
// Return a ParseResult
}
});
}
2.The above could be replaced by something similar to map
. I don't want change the definition of map
because that would conflict with the common definition of map in FP languages. But it could be something like map2(Parser, fn($output, $remainder) : ParseResult) : Parser
emit(Parser, fn($output):void) : Parser
. In other words, it's a combinator that returns a Parser that behaves exactly like the original parser, but performs side effects in function.<?php
$cache = new Cache();
$addToCache = fn($output) => $cache->add($output);
$parser = emit(char('a'), $addToCache); // If we successfully parse 'a', add 'a' to cache.
I've added emit()
in the commit above. (It's not in the main branch yet.) I'm not sure about the map2 thing yet, but you could easily make that yourself by adding a map2 to both Parser and ParseResult.
I'm still a bit unclear as to what exactly the point is of what you're trying to do, but let me know if these things help.
I think I found a way to achieve what I initially wanted to do. I'll try to implement it and close this PR by the end of this week.
--
Do you think we should add this utility?
function lazy(callable /*: () => Parser */ $parserFactory): Parser
{
return Parser::make(function (string $input) use ($parserFactory): ParseResult {
$parser = $parserFactory();
return $parser->run($input);
});
}
$parser = lazy(fn() => string('joris'));
Good to hear you fixed it. The lazy
combinator was suggested by someone on Twitter as well. I'm a bit hesitant to add it, because it forces the user to think about performance. I'm hoping the avoid that. We have some ideas that we want to try, but there are a number of intermediate steps we need first. So I don't want to add performance features before we can measure it and exhaust other options.
I finished implementing the parser using Parsica. The library is now working as expected : ).
Some notes:
or()
(used under the hood by either
) with emit()
.
either(
emit(
success(),
fn() => var_dump("I expect to be triggered")
),
emit(
success(),
fn() => var_dump("I do not expect to be triggered")
)
)->try('test');
My use-case: I'm trying to parse from the cache (sometimes successful) "or" using the root parser (always successful). I don't expect both events to be emitted.
return either(
$context->preflightCacheParser(),
$root
)->followedBy($rest);
This is what allows me to skip the repeated parts:
[
"/a/b/c/file1",
" **/a/b/c/** file2",
" **/a/b/c/** file3",
" **/a/b/** file4"
]
As you already know, it works when replacing the current implementation of or
with the commented implementation.
Perhaps it's another reason, beyond performance, to get rid of the current implementation.
"or()" is used by quite a lot of utils that would have to be rewritten, I hope the default behaviour will change.
When using emit(), the $receiver does not have access to: $output, $input and $remainder. My use case requires both $output and $remainder. I wouldn't know how to implement it with the current emit()
. Saying that, unlike the or()
behaviour, it's very easy to go around the problem by creating a new emit function.
The Persica implementation of the Parser is very slow. This utility is now roughly 10 times slower than its previous version. Of course, we have to take in consideration that it's much easier to maintain and an extra layer of cache would result in similar performance. Unfortunately, in some environments, cross-request caching isn't possible.
/**
* @test
*/
public function it_should_parse_under_100_ms()
{
$propertyName = atLeastOne(alphaNumChar());
$type = emit(
either(
eof(),
char('@')
->followedBy($propertyName)
->thenIgnore(eof()),
),
function () {}
);
$map = emit(
char('.')->followedBy($propertyName),
function () {}
);
$list = emit(
between(
char('['),
char(']'),
either(
char('@')
->followedBy($propertyName)
->map(fn($value) => [
'discriminatorName' => $value,
'keepKeys' => true
]),
$propertyName
->map(fn($value) => [
'discriminatorName' => $value,
'keepKeys' => false
]),
)
),
function () {}
);
$root = emit(
char('$'),
function () {}
);
$rest = many(any($map, $list))->followedBy($type);
$parser = either(
failure(), // $context->preflightCacheParser(),
$root
)->followedBy($rest);
$start = microtime(true);
for ($i = 0; $i < 500; $i++) {
$parser->try('$.q.w[@1].e[2]@int');
}
$end = microtime(true);
$this->assertLessThan(0.1, $end - $start);
}
Thanks again for your work!
This commit shows that the problem with either
is fixed: https://github.com/mathiasverraes/parsica/commit/75761c77dfb5934b9d914f3ce7106bccee06811b
(It's actually fixed as a consequence of something else, we swapped the implementation of or
to make sure it only executes the second parser if the first one failed.)
This change is in the main branch and will be in 0.4. There are lots of breaking changes so beware.
I've also added your other test. Would be possible to share your original parser? That way, we can define the test as a comparison, where the Parsica version must perform at most x% slower than the original one.
BTW I'm sure you know this, but if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course. I'm hoping to focus on performance after I get v0.4 out the door.
if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course.
Good point, I probably had it on.
--
The original parser isn't open-sourced and does much more than parsing. I'll try to implement another Parser using imperative code by the end of this week 🤞 so we can have a proper comparison.
FYI (without xdebug)
Imperative Current draft: https://gist.github.com/grifx/1efe84852f2e4dd867793d66149d152b Time: 00:00.046, Memory: 6.00 MB Parsica Time: 00:00.672, Memory: 6.00 MB
$context = new ParsingContext();
for ($i = 0; $i < 1000; $i++) {
$context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
$this->imperativeKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
$context->endParsingKey();
$context->clean();
}
$context = new ParsingContext();
for ($i = 0; $i < 1000; $i++) {
$context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
$this->parsicaKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
$context->endParsingKey();
$context->clean();
}
I'll try to clean, extract and share the unit test during the week.
Note: They are not behaving exactly the same way. For instance, the imperative parser deals with char escaping: $.www\.gooo\\\.oooogle\.com
. I haven't thought about how to do it with Parsica yet.
The new JSON parser can escape characters in string literals: https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L214
Should we close this? I'm not sure if there's anything actionable right now.
Sorry for the late reply.
Yes, let's close this issue.
I'll release the library using the imperative parser and open a PR to introduce Parsica.
Thanks for your help!
@mathiasverraes FYI, I cannot close this issue since I did not open it.
Reported on Twitter https://twitter.com/Wasquen/status/1276906414833758208?s=20